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Note to the reader 


The main language of this thesis is English. However, due to legal requirements, 
the general introduction is written in both English and French. The purpose of the 
introduction is to present the overarching issue connecting the three chapters and 
to outline their contributions to the existing literature. The acknowledgements are 
also written in both languages. 

To facilitate the consultation of references, a specific bibliography is provided 
at the end of each chapter. The appendices are located at the end of the work 
and are themselves subdivided into chapters. In contrast to the main text, the 
bibliographic references cited in the appendices are all grouped together at the end 
of the appendices. 


Note au lecteur 


La langue principale utilisée pour la rédaction de cette thése est l’ anglais. 
Néanmoins, en raison de contraintes légales, l’introduction générale de cette these 
est rédigée en anglais et en francais. Lobjectif de l’introduction est de présenter la 
problématique d’ensemble reliant les trois chapitres et de mettre en évidence leurs 
contributions 4 la littérature existante. Les remerciements sont également écrits 
dans les deux langues. 

Afin de faciliter la consultation des références, une bibliographie spécifique est 
ajoutée a la fin de chaque chapitre. Les annexes se trouvent a la fin de l’ouvrage et 
sont elles-mémes subdivisées en chapitres. Contrairement au texte principal, les 
références bibliographiques citées dans les annexes sont toutes regroupées a la fin 
des annexes. 
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INTRODUCTION 


Information structures, from which economic agents learn and form their beliefs, 
are largely endogenous. They are often designed by individuals or organizations to 
serve specific objectives. When information is disclosed strategically, it becomes 
a powerful instrument for providing incentives. By shaping agents’ beliefs, it 
influences their actions and, as a result, the outcomes of economic, social, and 
political interactions. The recent increase in data production, combined with 
advancements in data processing, has transformed information into a pervasive 
instrument of persuasion. In modern economies, these structures manifest them- 
selves, for example, in the form of algorithms, such as recommendation systems 
used by streaming (Che and Horner, 2018) and matching platforms (Romanyuk 
and Smolin, 2019), or predictive algorithms employed by police forces and law 
enforcement agencies (Hernandez and Neeman, 2022; Ichihashi, 2023). Other 
notable examples include rating (Saeedi and Shourideh, 2020; Vellodi, 2018), grad- 
ing (Boleslavsky and Cotton, 2015), certification (Zapechelnyuk, 2020), scoring 
(Ball, 2019), and performance measurement systems (Georgiadis and Szentes, 
2020; Ostrizek, 2022), or even financial stress tests (Goldstein and Leitner, 2018; 
Inostroza, 2019; Orlov, Zryumov, and Skrzypacz, 2021). This dissertation studies 
how information structures are and should be designed to achieve desirable social 
goals, as well as the constraints limiting the use of information as an incentive tool. 

The idea that information is endogenous is firmly rooted in the economics and 
game theory literature. Information can be strategically revealed or concealed 
through a variety of communication protocols, which depend on technological and 
institutional contexts. In the field of strategic communication, the contribution of 
Kamenica and Gentzkow (2011) marks a significant turning point. Traditionally, 
economists classify communication protocols into three categories depending 
on whether the transmission of information is costly (Spence, 1973), non-costly 
(Crawford and Sobel, 1982), or certifiable (Milgrom, 1981; Grossman, 1981). The 
innovation brought by Kamenica and Gentzkow, in comparison to these three 
communication modes, is to endow the information designer with full commitment 
power.! This assumption means that the information designer determines the 


‘Information design obviously dues its name to mechanism design, where a similar commitment 
assumption is made. Mechanism design concentrates on identifying the feasible set of institutions 
(i.e., the rules of the game) and how to optimally design them according to some social criterion, 
assuming the designer possesses full commitment power over the game that will be played by agents, 
and the players’ information structure is fixed. For instance, auction design consists in determining 
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receiver’s information structure ex-ante, that is, at the moment when the realization 
of the state of the world is not yet known, and cannot deviate from it when the state 
of the world is realized. Much of the literature has sought to provide microeconomic 
foundations for the the information designer’s commitment power assumption, 
which can be justified on technological, reputational, and credibility grounds (see, 
for example, Best and Quigley, 2017; Mathevet, Pearce, and Stacchetti, 2022; 
Lipnowski, Ravid, and Shishkin, 2022; Lin and Liu, 2022, as well as the references 
cited by the authors). For instance, once a technological company has designed the 
recommendation algorithm used to provide information to its platform users, it 
is difficult for the company to deviate from it in the short term without incurring 
prohibitive costs. Similarly, an institution responsible for maintaining the stability 
of the financial system has every interest in establishing transparent and unalterable 
rules to maintain its reputation and, consequently, the credibility of its stress tests. 

From a theoretical standpoint, particular attention has been paid to the questions 
of feasibility and optimality: to what extent does information design allow for 
influencing behaviors, and given the range of possibilities, what is the optimal 
information structure from the designer’s perspective? Kamenica and Gentzkow 
address these two questions by proposing a characterization of the payoffs achiev- 
able through information design and a method for systematically obtaining the 
optimal structure for the designer, in the case where only one agent receives the 
information. These results have subsequently been extended in various directions, 
for instance, when information design is costly (Gentzkow and Kamenica, 2014), 
when multiple information designers are in competition (Gentzkow and Kamenica, 
2017), or to dynamic environments (Ely, 2017), as well as to the broader frame- 
work of incomplete information games (Bergemann and Morris, 2016; Mathevet, 
Perego, and Taneva, 2020).? This body of work demonstrates the vast potential 
offered by information design in terms of implementable receivers’ behaviors and, 
consequently, welfare outcomes resulting from receivers’ strategic interactions. 
Indeed, the only constraint imposed by information is of a statistical nature: the 
designer can only induce beliefs that are consistent with Bayes’ rule and, therefore, 
the receivers’ prior beliefs.* Consequently, all behaviors that are compatible with 


the set of implementable allocations and how to maximize the revenue or social welfare from the 
auction, the bidders’ information structure being given. In contrast, information design maintains 
players’ incentives constant, focusing solely on manipulating the game’s information structure. 

2For a comprehensive presentation of these extensions, we refer to the literature reviews of 
Bergemann and Morris (2019) and Kamenica (2019). 

3This condition, called Bayes plausibility by Kamenica and Gentzkow, is sufficient when the 
designer aims to influence the behavior of a single agent. When multiple agents interact strategically 
after information disclosure, the set of beliefs that can be induced by the designer must also be 
compatible with the higher-order beliefs of the players (Mathevet et al., 2020). 
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Bayesian updating are theoretically attainable for an information designer with full 
commitment power.* For instance, in a more concrete setting, Bergemann, Brooks, 
and Morris (2015) show that, for monopolistic markets, an adequate design of the 
seller’s information structure regarding consumers’ marginal willingness to pay 
could lead to situations where the gains from trade are entirely captured by the 
seller, as well as situations where they are entirely captured by the consumers, and 
all intermediate scenarios, provided that the revealed information does not alter the 
seller’s profit compared to the situation where no information would be revealed. 
Thus, it is theoretically possible for a benevolent information designer, such as a 
regulatory authority, to distribute the entire surplus to consumers by revealing only 
information to the monopoly. 

Although recent research has led to significant theoretical and applied advance- 
ments, outlining the possibilities offered by information alone and describing the 
properties of optimal information structures in various contexts, the design of 
information structures remains subject to numerous practical constraints, which are 
only partially accounted for by the theory. As Kamenica, Kim, and Zapechelnyuk 
(2021) emphasize in their recent editorial: 


“The basic theory [of information design] makes a number of assump- 
tions, which are sufficiently plausible in many contexts and have enabled 
various novel insights. Yet, we think that a more flexible approach that 
relaxes these assumptions would significantly enhance the applicability 
of the theory. Here we focus on two of the assumptions. First, the 
receivers are the standard rational players who maximize their expected 
utility and make Bayesian inferences. Second, there are few or no 
constraints on feasible information structures (signals, experiments ).” 


The first two chapters of this dissertation aim precisely to integrate into the theory 
an important type of constraint, yet unexplored by the literature, as well as to relax 
the assumption of Bayesian belief updating by the receiver. 


Information design and upstream investment incentives. Agents can change 
their behavior in response to how information is structured before it is even 
revealed. These upstream strategic behaviors limit the set of information structures 
that the designer can use, as he must ensure that they are compatible with the 


4Importantly, this conclusion rests entirely on the commitment assumption. When the sender 
has no commitment power (Crawford and Sobel, 1982), his disclosure strategy must be consistent 
with his incentive constraints additionally to the Bayesian update condition. This implies that the 
meaning of the signals generated by the sender’s strategy is determined in equilibrium, unlike the 
case with commitment where the meaning of the signals is objectively determined. 
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players’ incentives upstream of information production. For example, a large 
number of goods and services are subject to tests, set up by public regulators, to 
determine whether they can be approved or receive certification (such as quality 
labels or green labels). However, the way tests are designed has an upstream 
impact on firms’ incentives to participate in those tests (Rosar, 2017; Harbaugh 
and Rasmusen, 2018), or on fraud and falsification behaviors (Perez-Richet and 
Skreta, 2022a,b). Therefore, it is crucial for regulators to take these incentives 
into account when designing the tests. In chapter 1, co-authored with Eduardo 
Perez-Richet, we examine another type of upstream strategic behavior: productive 
investments.> We develop an information design model where the state of the 
world can be transformed upstream by an agent at a certain cost. The designer’s 
goal is to act only if the final state of the world is sufficiently high, while the agent’s 
goal is for the designer to act. This model is particularly applicable to selection 
situations. For example, in the case of tests, the state of the world corresponds to 
the quality of the product a firm is seeking to certify, and the designer is a regulator 
deciding whether or not to approve the product. The information structure then 
corresponds to a test revealing information to the regulator about the product’s 
quality after the firm’s investment. The regulator’s goal is to acquire the best 
possible information for making their decision, while the firm seeks to maximize 
the probability of their product being approved by the regulator. Our main result 
shows that the optimal information structures for the designer are binary and 
deterministic. In the case of the regulator, this corresponds to setting a threshold 
of quality beyond which a signal indicating to approve the product is generated by 
the test and below which a signal indicating to reject it is generated. This result 
establishes a theoretical foundation for "pass or fail" selection rules, widely used 
in practice, and surprisingly shows that it is suboptimal to introduce randomness 
into the information structure when strategic investments can be made upstream of 
information production, contrary to the cases of participation and falsification. 


Information design and motivated belief updating. Setting aside strategic 
behaviors upstream, the receiver’s downstream behavior when receiving information 
can be influenced by how they choose to interpret the information. Although 
agents learn from the information they receive, a vast literature in behavioral 
and experimental economics shows that they do not systematically do so as a 
statistician would, as prescribed by Bayes’ rule, but rather let their preferences 


5Although this chapter is formulated in terms of a resource allocation problem, we establish an 
equivalence result between this allocation mechanism design problem and the information structure 
design problem in the section 1.5.3. 
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dictate, to some extent, how they form their beliefs (see notably, Bénabou and 
Tirole, 2016, p. 150, Epley and Gilovich, 2016, or Benjamin, 2019, section 9). We 
show that this type of motivated reasoning (according to the term of Kunda, 1990) 
is likely to affect the effectiveness of information as an incentive instrument and 
how information should be revealed compared to the Bayesian case. We base our 
motivated beliefs model on the one proposed by Caplin and Leahy (2019): after 
observing an informative signal, the receiver forms beliefs by comparing their 
anticipatory value and their psychological cost. This modeling assumption reflects, 
on the one hand, that beliefs have an intrinsic value for the agent. For example, it 
has been shown that individuals associate utility with beliefs about their self-image 
or future prospects, such as thinking they are better than others, being in good 
health, moving upward in the social ladder, etc. (Bénabou and Tirole, 2016). On the 
other hand, distorting one’s beliefs from Bayesian beliefs is psychologically costly. 
Evidence shows that agents employ sophisticated and costly mental strategies to 
protect or attain desirable beliefs, such as manipulating their own memory (see 
Bénabou, 2015; Bénabou and Tirole, 2016; Hagenbach and Koessler, 2022, and 
the references therein) or avoiding useful information (Golman, Hagmann, and 
Loewenstein, 2017), such as reliable tests for detecting serious illnesses (Oster, 
Shoulson, and Dorsey, 2013). In our model, the receiver’s beliefs thus depend 
on their preferences as a result of motivated belief updating, and overweight the 
states of the world associated with the highest potential payoff of the receiver. We 
show that distortion of the receiver’s beliefs leads to distortion of his behavior. 
Compared to a Bayesian individual, he prefers actions associated with the highest 
payoff and the highest variability of payoff, and if one action induces the highest 
payoff but another induces the highest variability of payoff, the preference for 
one or the other depends on the magnitude of the receiver’s belief distortion cost. 
Consequently, the efficacy of information as an incentive tool may vary according 
to individuals’ material stakes: persuasion is more effective, compared to the 
Bayesian case, when aimed at encouraging risky but potentially highly profitable 
behavior, and less effective when targeting more cautious behavior. We illustrate 
this result with applications, showing why information campaigns often prove 
ineffective in stimulating increased investment in preventive health treatments and 
how financial advisors can benefit from their clients’ overoptimistic beliefs. We 
also show that strategic disclosure of information to voters with heterogeneous 
partisan preferences can lead to belief polarization within the electorate. 
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The distributional impacts of information design. Chapter 3 addresses a 
question of a different nature than the first two chapters. The literature has mainly 
focused on characterizing the payoffs achievable through information design from 
an aggregate perspective. On the contrary, the distributional effects of information 
design have been neglected, 1.e., how the revelation of information affects different 
types of agents.© We examine this issue in the specific context of monopolistic 
markets, where the of the distribution of gains from trade generated by information 
is a first-order issue. Indeed, consumers are constantly leaving traces of their 
identity on the Internet, whether through their activity on social networks, their 
use of search engines, or their online purchases. The data generated and collected 
on consumers has become very valuable, as it allows companies to segment 
consumers, 1.e., to pool them into subgroups based on observable characteristics 
and to price discriminate according to the subgroup. As mentioned earlier in 
the introduction, Bergemann et al. (2015) show that revealing information about 
consumers’ marginal willingness to pay to the seller can induce a wide variety of 
outcomes in terms of segmentation and social surplus. In particular, it is always 
possible to segment the consumer population in a way that allocates the entirety of 
the gains from trade to them. To achieve this, segments can be created that pool 
consumers with high marginal willingness to pay together with those having a 
low marginal willingness to pay. Such pooling forces the monopolist to charge 
lower prices on those segments’ and, as a result, to make consumers with high 
marginal willingness to pay benefit from lower prices. However, a significant 
drawback of this type of segmentation is that if the marginal willingness to pay 
and consumers’ wealth are positively correlated, segmentations that maximize 
consumer surplus tend to favor the wealthiest consumers. How can the seller’s 
information be designed to benefit the poorest consumers? We answer this question 
by studying a model of market segmentation in a monopolistic setting assuming 
a redistributive objective. We show that redistributive-optimal segmentations 
always generate Pareto-efficient allocations but may require granting a strictly 
positive share of the surplus to the seller. We identify the set of markets for which 
redistributive segmentation involves leaving a rent to the monopoly. We also 
develop a procedure for constructing the redistributive-optimal segmentation and 
show that, when the correlation between willingness to pay and consumer wealth 
is sufficiently high, the redistributive-optimal segmentation divides consumers 


6A first foray in this direction is the contribution of Doval and Smolin (2021). 

7Otherwise, the monopoly would decrease too much the extensive margin compared to the 
intensive margin, by excluding too many buyers from consumption, which would decrease its 
profits. 
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into contiguous segments: consumers are ranked and grouped in an increasing 
manner based on their willingness to pay. In this way, poorer consumers benefit 
from lower prices than the wealthier ones. Interestingly, this contrasts sharply with 
segmentations that maximize consumer surplus without distributional concerns, 
which group wealthy and poor consumers together in order to provide lower prices 
to the wealthier ones. 
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INTRODUCTION EN FRANCAIS 


Les structures informationnelles, a partir desquelles les agents économiques 
apprennent et forment leurs croyances, sont en grande partie endogeénes. Elles 
sont souvent concues par des individus ou des organisations afin de servir des 
objectifs spécifiques. Lorsque l’information est divulguée de maniére stratégique, 
elle devient un puissant levier d’incitation. En influengant les croyances des 
agents, elle modifie leurs actions et, par conséquent, les résultats des interactions 
économiques, sociales et politiques. L’accroissement récent de la production de 
données, associé a l’amélioration de leur traitement, a transformé I’ information 
en un instrument de persuasion omniprésent. Dans les €conomies modernes, les 
structures d’ information se matérialisent, par exemple, sous la forme d’algorithmes, 
tels que les systemes de recommandation utilisés par les plateformes de streaming 
(Che and Horner, 2018) et d’appariement (Romanyuk and Smolin, 2019), ou les 
algorithmes prédictifs employés par les forces de police et les services de maintien 
de ordre (Hernandez and Neeman, 2022; Ichihashi, 2023). Les systémes de 
notation (Saeedi and Shourideh, 2020; Vellodi, 2018), d’évaluation (Boleslavsky 
and Cotton, 2015), de certification (Zapechelnyuk, 2020), d’ attribution de scores 
(Ball, 2019), de mesure de la performance (Georgiadis and Szentes, 2020; Ostrizek, 
2022), ou encore les stress tests auxquels sont soumises les institutions financiéres 
(Goldstein and Leitner, 2018; Inostroza, 2019; Orlov, Zryumov, and Skrzypacz, 
2021) en sont d’ autres exemples saillants. Cette thése vise a étudier la maniére 
dont les structures d’information sont et devraient étre concues afin d’ atteindre 
des objectifs sociaux souhaitables, ainsi que les contraintes limitant l’utilisation de 
V’information en tant qu’outil d’incitation. 

Lidée que l’information est endogene est bien établie dans la littérature en 
économie et en théorie des jeux. Linformation peut étre révélée ou dissimulée de 
maniere stratégique via différents protocoles de communication, qui varient selon 
le contexte technologique et institutionnel. Dans le domaine de la communication 
stratégique, la contribution de Kamenica and Gentzkow (2011) marque un tournant 
significatif. Traditionnellement, les économistes classent la communication en 
trois catégories selon que |’ information est cotiteuse (Spence, 1973), non cofiteuse 
(Crawford and Sobel, 1982) ou certifiable (Milgrom, 1981; Grossman, 1981). 
L’innovation apportée par Kamenica and Gentzkow, par rapport a ces trois modes 
de communication, consiste a doter le concepteur d’information d’un pouvoir 
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d’engagement total®. Cela signifie que le concepteur détermine la structure d’ in- 
formation du récepteur ex-ante, c’est a dire a l’instant ot la réalisation de I’ état 
du monde n’est pas encore connue, et ne peut pas la modifier lorsque | état du 
monde devient connu. Une grande partie de la littérature s’ est efforcée a donner 
des fondements microéconomiques a l’hypothése du pouvoir d’engagement du 
concepteur d’ information, qui peut étre justifiée sur des bases technologiques, de 
réputation et de crédibilité (voir par exemple Best and Quigley, 2017; Mathevet, 
Pearce, and Stacchetti, 2022; Lipnowski, Ravid, and Shishkin, 2022; Lin and 
Liu, 2022, ainsi que les références citées par les auteurs). Par exemple, une fois 
qu’ une entreprise technologique a concu I’ algorithme de recommandation utilisé 
pour fournir de l’information aux utilisateurs de sa plateforme, il lui est difficile 
de s’en écarter a court terme sans encourir un coat prohibitif. De méme, une 
institution garante de la stabilité du systéme financier a tout intérét a établir des 
régles transparentes et intangibles pour maintenir sa réputation et, par conséquent, 
la crédibilité de ses stress tests. 

D’un point de vue théorique, une attention particuliére a également été portée 
aux questions de la faisabilité et de l’optimalité : dans quelle mesure la conception 
d’information permet-elle d’influencer les comportements et, étant donné |’en- 
semble des possibles, quelle est la structure d’information optimale du point de 
vue du concepteur ? Kamenica and Gentzkow ont répondu a ces deux questions en 
caractérisant l’ensemble des gains atteingnables par la conception d’ information, 
ainsi qu’une méthode permettant d’obtenir de maniére systématique la structure 
optimale pour le concepteur, dans le cas ou un seul agent recoit l’ information. Ces 
résultats ont par la suite été étendus, par exemple, aux cas ot la conception de 
l'information est cottteuse (Gentzkow and Kamenica, 2014), ot plusieurs concep- 
teurs d’information sont en compétition (Gentzkow and Kamenica, 2017), aux 
environnements dynamiques (Ely, 2017), ainsi qu’au cadre plus général des jeux a 
information incomplete (Bergemann and Morris, 2016; Mathevet, Perego, and Ta- 
neva, 2020)°. Lensemble de ces travaux montre l’ampleur des possibilités offertes 


8Le terme “conception de l’information’” tire évidemment son nom de la “conception de 
mécanismes’, oti une hypothése d’engagement similaire est faite. La conception de mécanismes se 
concentre sur l’identification de l’ensemble réalisable des institutions (c’est-a-dire des régles du 
jeu) et comment les concevoir de maniére optimale selon un critére social particulier, en supposant 
que le concepteur a un plein pouvoir d’engagement sur le jeu auquel les agents participeront et 
que la structure d’information des joueurs est fixe. Par exemple, la conception d’enchéres consiste 
a déterminer l’ensemble des allocations réalisables et 4 maximiser les revenus ou le bien-étre 
social issus de l’enchére, la structure d’information des enchérisseurs étant donnée. En revanche, 
la conception de l’information maintient les incitations des joueurs constantes, et se concentre 
uniquement sur la manipulation de la structure d’information du jeu. 

°Pour une présentation exhaustive de ces extensions, nous renvoyons aux revues de la littérature 
de Bergemann and Morris (2019) et Kamenica (2019). 
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par la conception de structures d’information en termes de comportements des 
récepteurs et, par conséquent, de bien-étre résultant de leurs interactions. En effet, 
Vunique contrainte liée a l’information est d’ordre statistique : les seules croyances 
que le concepteur peut induire chez un récepteur rationnel sont celles qui sont 
compatibles avec la régle de Bayes et, donc, la croyance a priori du récepteur™. Par 
conséquent, tous les comportements pouvant découler d’une mise a jour bayésienne 
des croyances sont théoriquement atteignables pour un concepteur d’ information 
disposant d’un pouvoir d’engagement total". Par exemple, dans un contexte plus 
concret, Bergemann, Brooks, and Morris (2015) ont démontré que, pour les mar- 
chés monopolistiques, une conception adéquate de la structure d’information du 
vendeur concernant la disposition marginale 4 payer des consommateurs peut aussi 
bien conduire aux situations ou les gains a l’échange sont entiérement captés par 
le vendeur, qu’a celle ou ils sont entiérement captés par les consommateurs, ainsi 
qu’a toutes les situations intermédiaires, a condition que |’information révélée 
n’altére pas le profit du vendeur par rapport a la situation ot il n’obtiendrait aucune 
information. Ainsi, il est théoriquement possible pour un concepteur d’ information 
bienveillant, tel qu’une autorité de régulation, de distribuer tout le surplus aux 
consommateurs en révélant seulement de l’information au monopole. 

Bien que les recherches récentes aient conduit a d’importantes avancées 
théoriques et appliquées, délimitant les possibilités offertes par |’ information seule 
et décrivant les propriétés des structures d’information optimales dans divers 
contextes, la conception de structures d’ information reste soumise 4 de nombreuses 
contraintes pratiques, qui ne sont encore que partiellement prises en compte par la 
théorie. Comme le soulignent Kamenica, Kim, and Zapechelnyuk (2021) dans leur 
récent éditorial : 


"La théorie de base [de la conception de structures d’information] 
repose sur plusieurs hypothéses, qui sont suffisamment plausibles dans 
divers contextes et ont conduit a de nombreuses découvertes innovantes. 
Toutefois, nous estimons qu’une approche plus flexible, qui assouplirait 


Cette condition, appelée plausibilité bayésienne par Kamenica and Gentzkow, est suffisante 
lorsque le concepteur vise 4 influencer le comportement d’un seul agent. Lorsque plusieurs agents 
interagissent stratégiquement aprés la divulgation de l’information, l|’ensemble des croyances 
pouvant étre induites par le concepteur doit également étre compatible avec les croyances d’ordre 
supérieur des joueurs (Mathevet et al., 2020). 

11] est important de noter que cette conclusion repose entiérement sur |’ hypothése de |’ engage- 
ment. Lorsque |’émetteur n’a pas de pouvoir d’ engagement (Crawford and Sobel, 1982), sa stratégie 
de divulgation doit étre compatible avec ses contraintes d’incitation en plus de la condition de 
mise a jour bayésienne. Cela implique que la signification des signaux générés par la stratégie de 
lV’ expéditeur est déterminée a /’équilibre, contrairement au cas de l’engagement ou la signification 
des signaux est déterminée de maniére objective. 
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ces hypothéses, renforcerait considérablement I’ applicabilité de la théo- 
rie. Nous nous intéressons ici a deux de ces hypothéses. Premiérement, 
les récepteurs sont des acteurs rationnels standard qui maximisent 
leur utilité espérée et effectuent des inférences bayésiennes. Deuxiéme- 
ment, il y a peu ou pas de contraintes sur les structures d’information 
réalisables'?." 


Les deux premiers chapitres de cette thése visent précisément a intégrer a la 
théorie un nouveau type de contrainte encore inexploré par la littérature, ainsi qu’a 
assouplir I’ hypothése de mise a jour bayésienne des croyances du récepteur. 


Conception de l’information et incitations 4 l'investissement en amont. Les 
agents peuvent modifier leur comportement en réaction a la manieére dont I’ infor- 
mation est structurée avant méme que celle-ci ne soit révélée. Ces comportements 
stratégiques en amont de la conception de l’information limitent les structures 
d’ information utilisables par le concepteur, car elles doivent étre compatibles avec 
les incitations des joueurs en amont de la production d’information. Par exemple, 
un grand nombre de biens et services sont soumis a des fests, mis en place par 
des régulateurs publics, et destinés 4 déterminer s’ils peuvent étre homologués 
ou s’ils peuvent bénéficier d’une certification (tels que des labels qualité ou des 
labels verts). Or, la fagon dont les tests sont concus a un impact en amont sur 
les incitations des firmes a participer auxdits tests (Rosar, 2017; Harbaugh and 
Rasmusen, 2018), ou encore sur les comportements de fraude et de falsification 
(Perez-Richet and Skreta, 2022a,b). Il est donc crucial pour les régulateurs de 
prendre en compte ces inciations lors de la conception méme des tests. Dans le 
chapitre 1, co-écrit avec Eduardo Perez-Richet, nous examinons un autre type 
de comportement stratégique en amont : les investissements productifs'. Nous 
élaborons un modéle de conception d’ information dont la particularité est que I’ état 
du monde peut étre modifié en amont par un agent a un certain cottt. Lobjectif du 
concepteur est d’agir uniquement si I’ état du monde final est suffisamment élevé 
tandis que celui de l’agent est que le concepteur agisse. Ce modeéle s’ applique 
particuliérement aux situations de sélection. Par exemple, dans le cas des tests, |’ état 
du monde correspond 4 la qualité du produit qu’une firme cherche a faire certifier 
et le concepteur est un régulateur devant décider d’approuver ou non le produit. 
La structure d’information correspond alors a un test révélant des informations 


Traduction de |’ auteur. 

8Bien que ce chapitre soit formulé en termes de probleme d’allocation de ressources, nous 
établissons un résultat d’équivalence entre ce probléme de conception de mécanisme d’allocation 
et de conception de structure d’information dans la section 1.5.3. 
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au régulateur sur la qualité du produit apres investissement de |’entreprise. Le 
but du régulateur est d’acquérir la meilleure information possible pour prendre sa 
décision, tandis que l’entreprise cherche a maximiser la probabilité que son produit 
soit approuvé par le régulateur. Notre résultat principal montre que les structures 
d’information optimales pour le concepteur sont binaires et déterministes. Dans 
le cas du régulateur, cela correspond a fixer un seuil de qualité au-dela duquel un 
signal signifiant d’ approuver le produit est généré par le test et en dessous duquel un 
signal signifiant de le rejeter est généré. Ce résultat établit une fondation théorique 
pour les régles de sélection de type "réussite ou échec", largement utilisées dans 
la pratique, et montre de maniére surprenante qu’il est sous-optimal d’introduire 
de l’aléa dans la structure d’information lorsque des investissements stratégiques 
peuvent étre réalisés en amont de la production d’ information, contrairement aux 
cas de la participation et de la falsification. 


Conception de l’information et formation motivée des croyances. Abstraction 
faite des comportements stratégiques en amont, le comportement du récepteur 
en aval, lors de la réception de |’ information, peut étre influencé par la maniére 
dont il choisit d’interpréter \ information. Bien que les agents apprennent de 
Vinformation qu’ils regoivent, une vaste littérature en économie comportementale 
et expérimentale montre qu’ils ne le font pas systématiquement a la manieére 
d’un statisticien, comme le prescrirait la regle de Bayes, mais laissent leurs 
préférences dicter dans une certaine mesure la fagon dont ils forment leurs 
croyances (voir notamment, Bénabou and Tirole, 2016, p. 150, Epley and Gilovich, 
2016, ou Benjamin, 2019, section 9). Nous montrons que ce type de raisonnements 
motivés (selon le terme de Kunda, 1990) est susceptible d’affecter |’ efficacité 
de l'information en tant qu’instrument d’incitation ainsi que la maniére dont 
l'information devrait étre révélée par rapport au cas bayésien. Nous basons notre 
modele de croyances motivées sur celui proposé par Caplin and Leahy (2019) : 
apres avoir observé un signal informatif, le récepteur forme ses croyances en 
comparant leur valeur anticipatoire et leur coat psychologique. Cette hypothése de 
modélisation refléte d’une part que les croyances ont une valeur intrinséque pour 
V’agent. Par exemple, il a été montré que les individus associent une utilité aux 
croyances concernant leur propre image ou leurs perspectives d’avenir, comme 
se penser meilleur que les autres, étre en bonne santé, progresser dans |’ échelle 
sociale, etc. (voir Bénabou and Tirole, 2016). D’autre part, distordre ses croyances 
par rapport aux croyances bayésiennes est psychologiquement cotiteux. L évidence 
expérimentale montre en effet que les individus emploient des stratégies mentales 
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sophistiquées et cofiteuses pour protéger ou atteindre des croyances souhaitables, 
comme manipuler leur propre mémoire (voir Bénabou, 2015; Bénabou and Tirole, 
2016; Hagenbach and Koessler, 2022, et les références citées par les auteurs) ou 
refuser de s’exposer a des informations (Golman, Hagmann, and Loewenstein, 
2017), telles que des tests fiables permettant de détecter des maladies graves (Oster, 
Shoulson, and Dorsey, 2013). Dans notre modeéle, les croyances du destinataire 
dépendent donc de ses préférences a la suite de la mise a jour motivée des croyances, 
et surpondérent les états du monde associés au gain potentiel le plus élevé. Nous 
montrons que la distorsion des croyances du récepteur entraine une distorsion 
de son comportement. En comparaison d’un individu bayésien, il privilégie les 
actions associées au gain le plus élevé et a la variabilité de gain la plus élevée, et 
Si une action induit le gain le plus élevé mais qu’une autre induit la plus grande 
variabilité de gain, la préférence pour I’une ou |’ autre dépend de |l’ampleur du 
coat de distortion des croyances du récepteur. Ainsi, /’efficacité de l’information 
en tant qu’outil d’incitation peut varier en fonction des enjeux matériels des 
individus : la persuasion est plus efficace, par rapport au cas bayésien, lorsqu’elle 
vise 4 encourager un comportement risqué mais potentiellement trés rentable, 
et moins efficace lorsqu’elle vise 4 encourager un comportement plus prudent. 
Nous illustrons ce résultat avec des applications, montrant pourquoi les campagnes 
d’ information se révélent souvent inefficaces pour stimuler un investissement accru 
dans les traitements de santé préventifs et comment les conseillers financiers 
peuvent bénéficier des croyances excessivement optimistes de leurs clients. Nous 
montrons également que la divulgation stratégique d’information a des électeurs 
ayant des préférences partisanes hétérogénes peut entrainer une polarisation des 
croyances au sein de I’électorat. 


Les effets distributionnels de la conception de l’information. le chapitre 3 
aborde une question d’une nature différente de celle des deux premiers chapitres. 
La littérature s’est principalement concentrée sur la caractérisation des situations 
atteignables par la conception d’ information d’un point de vue agrégé. Un aspect 
qui, au contraire, a été négligé concerne les effets distributionnels de la conception 
de l’ information, c’est-a-dire comment la révélation d’ information affecte différents 
types d’agents*. Nous étudions cette question dans le contexte spécifique des 
marchés monopolistiques, ot la répartition des gains a l’échange générés par 
l'information est une question de premiére importance. En effet, les consommateurs 
laissent constamment des traces de leur identité sur Internet, que ce soit par leur 


4Une premiére incursion de la littérature dans cette direction est la contribution de Doval and 
Smolin (2021). 
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activité sur les réseaux sociaux, leur utilisation des moteurs de recherche ou 
leurs achats en ligne. Les données générées et collectées sur les consommateurs 
sont devenues trés précieuses, car elles permettent aux entreprises de segmenter 
les consommateurs, c’est-a-dire de les regrouper en sous-groupes sur la base de 
caractéristiques observables et de pratiquer une discrimination par les prix en 
fonction du sous-groupe. Comme mentionné précédemment dans |’ introduction, 
Bergemann et al. (2015) montrent que la révélation d’ information sur la disposition 
marginale a payer des consommateurs peut induire une grande variété de résultats 
en termes de ségementation et de surplus social. En particulier, il est toujours 
possible de segmenter la population des consommateurs de fagon a leur allouer 
Ventiereté des gains a |’ échange. Pour ce faire, il est possible de créer des segments 
qui regroupent les consommateurs dont la disposition marginale a payer est élevée 
et ceux dont la disposition marginale a payer est faible. Une telle segmentation force 
le vendeur a pratiquer des prix plus bas sur ces segments® et, par conséquent, a faire 
bénéficier les consommateurs ayant une forte disposition marginale a payer de prix 
plus bas. Toutefois, un inconvénient majeur de ce type de segmentation est que si la 
disposition marginale a payer et la richesse des consommateurs sont positivement 
corrélées, les segmentations maximisant le surplus des consommateur ont tendance 
a profiter aux consommateurs les plus riches. Comment concevoir |’information 
du vendeur de maniére a bénéficier aux consommateurs les plus pauvres ? Nous 
répondons a cette question en étudiant un modéle de segmentation d’un marché 
monopolistique avec un objectif redistributif. Nous montrons que les segmentations 
redistributives optimales générent toujours des allocations efficientes au sens de 
Pareto, mais peuvent nécessiter d’accorder une part strictement positive du surplus 
au vendeur. Nous identifions l’ensemble des marchés pour lesquels la segmentation 
redistributive implique de laisser une rente au monopole. Nous élaborons également 
une procédure permettant de construire la segmentation redistributive optimale et 
montrons que, lorsque la corrélation entre la disposition a payer et la richesse des 
consommateurs et suffisamment élevée, la segmentation redistributive optimale 
divise les consommateurs en segments contigus : les consommateurs sont classés 
et regroupés de maniére croissante en fonction de leur disposition a payer. De 
cette fagon, les consommateurs plus pauvres bénéficient de plus faibles prix que 
les plus riches. De maniere intéressante, ce résultat contraste fortement avec les 
segmentations maximisant le surplus des consommateurs sans préoccupation 
distributionnelle, qui regroupent ensemble consommateurs riches en pauvres afin 


5Sinon, le monopole diminuerait trop la marge extensive par rapport a la marge intensive, en 
excluant trop d’acheteurs de la consommation, ce qui diminuerait ses profits. 
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de faire bénéficier les plus riches de prix plus faibles. 
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1. Non-MARKET ALLOCATION 
MECHANISMS: OPTIMAL DESIGN 
AND INVESTMENT INCENTIVES: 


Abstract 


We study how to optimally design selection mechanisms, accounting 
for agents’ investment incentives. A principal wishes to allocate a 
resource of homogeneous quality to a heterogeneous population of 
agents. The principal commits to a possibly random selection rule 
that depends on a one-dimensional characteristic of the agents she 
intrinsically values. Agents have a strict preference for being selected 
by the principal and may undertake a costly investment to improve their 
characteristic before it is revealed to the principal. We show that even 
if random selection rules foster agents’ investments, especially at the 
top of the characteristic distribution, deterministic “pass-fail” selection 
rules are in fact optimal. 


1.1. INTRODUCTION 


Allocating goods, services, or prizes often requires selecting agents on the basis 
of measurable characteristics. In many important contexts, including university 
admissions, the allocation of research grants, the issuance of certifications by 
regulatory agencies, and promotion decisions within organizations, selection 
procedures inherently generate incentives for agents to strategically invest in 
the characteristics on which they are evaluated. When such investments are 


'This chapter is a joint work with Eduardo Perez-Richet. We thank Ricardo Alonso, Daniel 
Barreto, Aislinn Bohren, Alexis Ghersengorin, Simon Gleyze, Jeanne Hagenbach, Emeric Henry, 
Emir Kamenica, Jan Knoepfle, Frédéric Koessler, Flavien Léger, Shengwu Li, Laurent Mathevet, 
Daniel Monte, Franz Ostrizek and Sevgi Yuksel for their valuable comments and suggestions at 
various stages of the project. We also thank all the seminar participants at Sciences Po, Paris 
School of Economics, European University Institute, CSEF — University of Naples Federico II, 
and Institute for Microeconomics at the University of Bonn. Part of this research was conducted 
while both authors were visiting the Department of Economics at the European University Institute, 
whose hospitality is gratefully acknowledged. All remaining errors are ours. This project has 
received funding from the European Research Council (ERC) under the European Union’s Horizon 
2020 research and innovation programme (grant agreement 850996 — MOREV and 101001694 — 
IMEDMC). 
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possible, institutions face the challenge of designing selection mechanisms that 
take into account the endogenous nature of agents’ characteristics. Accounting for 
investment incentives is therefore of paramount importance in designing effective 
selection mechanisms. We propose a theory in which agents can transform their 
characteristics at some cost in response to selection and answer the research 
question: What is the optimal selection mechanism for an institution whose 
objective is to select agents with the highest possible characteristic, taking into 
account agents’ investment incentives? 

A simple selection rule, often used in practice, is to set a pass-fail selection 
cutoff, such as exams with a pass grade or certifications with a fixed quality standard. 
Pass-fail selection induces agents just below the cutoff to invest so they can pass 
but also discourages investments for all agents above the cutoff. Intuitively, random 
selection rules could perform better by spreading investment incentives, especially 
at the top. Contrary to this intuition, our main result shows that, in fact, pass-fail 
selection is optimal when accounting for investment. By doing so, we provide a 
firmer foundation for the use of this simple class of selection rules. 

We consider the following model. A principal wishes to allocate a unit mass of 
resources to a unit mass of agents. To do so, she commits to a possibly random 
selection rule based on a one-dimensional characteristic of agents. However, she 
cannot use monetary transfers. The principal intrinsically values the characteristic 
of agents so long as it exceeds a given preference threshold. Agents have a strict 
preference for being allocated the good and can undertake a costly investment to 
improve their characteristic before it is revealed. Finally, the selection rule operates 
and the outcome is realized. 

We make two assumptions. First, the principal’s preference threshold lies in 
the upper tail of the distribution of characteristics, so that its density decreases in 
the region of interest. For tractability, we also assume that the investment cost 
function of the agents is quadratic in the amount of investment, and that the value 
of allocating the good for the principal is linear in the characteristic. 

To give an intuition for our result, consider deviating from a pass-fail rule with 
a given cutoff to an increasing random rule allocating the resource with non-zero 
probability above the cutoff. Randomizing the allocation has a positive effect on 
the principal’s payoff by encouraging investments at the top of the characteristic 
distribution. However, it has a negative effect by unduly rejecting some agents 
having invested, and by excluding agents at the bottom of the distribution who 
would have invested under the pass-fail rule. When the distribution of initial 
characteristics has a decreasing density, the negative effect always dominates. 
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Technically, the non-linearity of the principal’s objective created by the absence 
of monetary transfers makes the characterization of optimal mechanisms difficult 
in our setting. In contrast to environments with transfers and quasi-linear utilities, 
we cannot use the standard resolution method developed in Myerson (1981). 
To establish the optimality of pass-fail selection, we start by showing that any 
optimal selection rule must be zero below the principal’s preference threshold and 
non-decreasing above. We then consider a transformation of the agents’ indirect 
utility function that we call pseudo-utility. We characterize the set of pseudo-utility 
functions that are implementable under incentive-compatible mechanisms. Using 
this characterization we then show that the original optimization program of the 
principal boils down to a problem of calculus of variations with pseudo-utility as an 
optimization variable. This problem is not standard because it involves maximizing 
a convex objective functional, and thus cannot be solved using the first-order 
approach. We prove that the set of implementable pseudo-utility functions is 
compact and convex. Hence, the Krein-Milman theorem and Bauer’s Maximum 
Principle guarantee that a solution to the variational program can be found at 
an extreme point of the domain. We provide necessary conditions on the shape 
of extreme points. Together with the tangent inequality for convex functionals, 
these necessary conditions imply that the extreme point corresponding to the 
pseudo utility implemented by a pass-fail selection rule is an optimal solution. 
The final step simply requires optimizing the principal’s payoff with respect to the 
one-dimensional selection cutoff. 

Next, we perform a comparative statics exercise with respect to the magnitude 
of agents’ investment costs. We show that the optimal cutoff decreases and that 
the mass of excluded agents at the bottom increases as investment costs increase. 
Accordingly, the designer’s equilibrium payoff decreases in the agents’ costs and 
naturally converges to the optimal payoff she would obtain if she could not induce 
any investment as costs become arbitrarily large. Under a pass-fail rule, a strictly 
positive mass of agents are bunching at the selection cutoff. We prove that agents 
with lower types in the bunching interval are affected negatively by an increase 
in investment costs while agents with higher types in the interval are affected 
positively. Intuitively, the direct effect of an upward scaling in the costs is greater 
than the indirect effect of the decrease in the selection cutoff for all types which 
are already sufficiently close to it and conversely for lower types. 

We then extend the model in three directions. First, we consider the case where 
the principal is subject to a capacity constraint, 1.e., has a strictly smaller amount 
of resources to allocate than the total mass of agents. Second, we consider the 
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problem of a planner maximizing utilitarian social welfare. We show that in both 
cases, pass-fail mechanisms remain optimal. In the first case, the optimal cutoff 
is naturally higher than in the baseline solution whenever the capacity constraint 
is binding. In the case of optimal utilitarian welfare, however, accounting for 
the agents’ investment costs pushes the optimal cutoff downwards. Finally, we 
show that the optimal outcome can still be implemented when the principal’s 
commitment power is relaxed. We assume that the principal cannot commit 
to allocation mechanisms anymore. Instead, she makes her allocation decision 
based on information provided by an intermediary. The intermediary shares the 
same objective as the principal and can produce information on the results of the 
agents’ investments by committing to a statistical experiment (Blackwell, 1951, 
1953). After observing the experiment chosen by the intermediary, the agents 
choose their investment strategies. We show that the recommendation principle 
holds in this environment. This implies that the intermediary can restrict her 
choice, without loss of generality, to statistical experiments whose outcomes are 
obedient action recommendations for the principal. Because of the alignment of the 
preferences between the intermediary and the principal, the obedience constraint 
is never binding. The conditional probability of recommending to allocate under 
the intermediary’s policy can therefore be interpreted as the selection rule in our 
original problem. This exercise shows that committing to information or to a 
mechanism is equivalent in this environment. Consequently, a higher commitment 
power has no additional value for the principal. 


Relation to the literature. The question of investment incentives in resource 
allocation mechanisms has been the subject of an extensive literature, particularly 
in the context of the “hold-up” problem.” A fundamental contribution is the one 
of Rogerson (1992) who shows that VCG allocation mechanisms (Vickrey, 1961; 
Clarke, 1971; Groves, 1973) induce ex-ante optimal investment incentives and 
thus overcome the hold-up problem. This result has been extended by Bergemann 
and Valimaki (2002) to situations where agents invest in information acquisition 
before participating in the mechanism, by Athey and Segal (2013) to dynamic 
environments, by Hatfield, Kojima, and Kominers (2019) and Akbarpour, Kominers, 


2The hold-up problem arises in situations where (i) the parties to a future transaction can 
undertake specific sunk cost investments that affect the value of the transaction and (ii) the form and 
value of the optimal transaction may be affected by unforeseen and non-contractible contingencies. 
Historically, the hold-up problem originates from the literature on transaction costs and the nature 
of the firm (Klein, Crawford, and Alchian, 1978; Williamson, 1979) and has received a lot of 
attention from the literature on incomplete contracts (e.g., Grossman and Hart, 1986; Tirole, 1986; 
Hart and Moore, 1988; Chung, 1991; MacLeod and Malcomson, 1993; Aghion, Dewatripont, and 
Rey, 1994). 
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Li, Li, and Milgrom (2022) to approximately efficient mechanisms, and by Tomoeda 
(2019) to full implementation. Investment incentives have also been studied in 
more specific settings such as public procurement (Laffont and Tirole, 1986; 
Arozamena and Cantillon, 2004), revenue-maximizing auctions (Daley, Schwarz, 
and Sonin, 2012; Gershkov, Moldovanu, Strack, and Zhang, 2021), bilateral trading 
(Gul, 2001; Lau, 2008; Dilmé, 2019; Condorelli and Szentes, 2020) and matching 
mechanisms (Hatfield, Kojima, and Kominers, 2014; Hatfield, Kojima, and Narita, 
2016). For the most part, the aforementioned works consider allocation problems 
with transfers, and all consider investments as a costly action influencing the 
agents’ valuations (or costs) before participating in the mechanism. In contrast, we 
consider mechanisms without transfers where the agents’ investments do not affect 
their valuations for the object but have an intrinsic value for the designer. 

We thus also contribute to the literature analyzing the optimal design of resource 
allocation mechanisms without monetary transfers, often referred to as “non-market 
mechanisms’.? A significant part of the literature focuses on the costly signaling 
setting, in which the designer can require agents to engage in a socially wasteful 
activity (e.g., waiting in line or filling application forms) used as a screening 
device to separate high types from low types, such as in Hartline and Roughgarden 
(2008), Yoon (2011), Condorelli (2012), Chakravarty and Kaplan (2013), Ashlagi, 
Monachou, and Nikzad (2021a,b), Ottaviani (2021) or Kleiner et al. (2021), Section 
4.1.4 Relatedly, Perez-Richet and Skreta (2022b) show that selection rules inducing 
falsification are optimal when agents can misreport their types at some cost to the 
designer. Perez-Richet and Skreta (2022a), in contrast, focus on the design of 
falsification-proof selection rules. Other studies highlight the fact that correlation 
can be profitably exploited when transfers are not available. In Bhaskar and Sadler 
(2020) the designer takes advantage of the fact that allocating certain types of 
goods causes positive externalities, thus correlating agents’ valuations. Similarly, 
in Kattwinkel (2020), Kattwinkel, Niemeyer, Preusser, and Winter (2022) and 
Niemeyer and Preusser (2022) the optimal mechanisms leverage on the correlation 


3It is worth noting that a parallel literature has focused on establishing the optimality of 
non-market mechanisms when a trade-off between allocative efficiency and equity is involved. A 
seminal contribution in that strand of the literature is Weitzman (1977). Condorelli (2013) shows 
that non-market mechanisms are optimal when the characteristic valued by the principal is not 
sufficiently correlated with the agents’ willingness to pay, preventing her from obtaining all relevant 
information through the appropriate design of prices. Akbarpour, Dworczak, and Kominers (2020) 
extend Condorelli’s contribution to environments with a continuum of heterogeneous qualities, a 
continuum of agents, and endogenous Pareto weights reflecting the statistical correlation between 
the agents’ willingness to pay and their marginal contribution to social welfare. Kleiner, Moldovanu, 
and Strack (2021) also extend it to general matching contests. 

4Ambrus and Egorov (2017) and Amador and Bagwell (2020) also characterize optimal 
delegation mechanisms with signaling. 
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of agents’ information. When information cannot be extracted through prices, the 
mechanism designer can also rely on costly verification such as in Ben-Porath, 
Dekel, and Lipman (2014), Mylovanov and Zapechelnyuk (2017), Erlanson and 
Kleiner (2019), and Kattwinkel and Knoepfle (2022).° In contrast to all these 
papers, we analyze a model in which the designer selects agents on the basis of the 
observable outcome of an investment. This constitutes the polar case to that of 
signaling or falsification. Indeed, our model can be viewed as a situation in which 
the agent’s costly action does not serves as a pure screening variable but is instead 
intrinsically valuable. 

In our setting, the optimal selection rule is deterministic. This is in stark contrast 
with optimal mechanisms under costly signaling, which generally involve random 
rationing because of binding incentive constraints (see Hartline and Roughgarden, 
2008; Yoon, 2011; Condorelli, 2012; Chakravarty and Kaplan, 2013; Ashlagi 
et al., 2021a,b; Kleiner et al., 2021). Similarly, falsification incentives induce the 
designer to use random selection rules in Perez-Richet and Skreta (2022a,b). Under 
correlated information, Kattwinkel (2020) and Niemeyer and Preusser (2022) 
show that optimal mechanisms might involve randomization and may not even be 
monotone. Threshold selection rules, however, turn out to prove optimal when the 
designer can verify the agents’ claims at some cost as in Ben-Porath et al. (2014), 
Mylovanov and Zapechelnyuk (2017), Erlanson and Kleiner (2019) and Kattwinkel 
and Knoepfle (2022).° In a setting with a continuum of heterogeneous qualities and 
a continuum of agents with different preference intensities, Ortoleva, Safonov, and 
Yariv (2021) show that random selection rules are optimal under both symmetric 
and asymmetric information about the agents’ preferences. Our result also resonates 
with the optimality of posted price mechanisms in settings with monetary transfers, 
proved independently by Myerson (1981) for revenue-maximizing auctions and by 
Riley and Zeckhauser (1983) in the context of monopoly pricing.” 

Interestingly, our paper also relates to the literature on statistical discrimination 
and affirmative action.® Our problem of allocation without transfers can be seen as 
a generalization of Chan and Eyster (2003) and Ray and Sethi (2010) where the 


5Halac and Yared (2020) conduct a similar analysis in the context of delegation. 

Threshold mechanisms also turn out to be optimal in the model of delegation with costly 
verification of Halac and Yared (2020). We also refer to Kovaé and Mylovanov (2009) and 
Kleiner et al. (2021), Section 4.2, who prove under which conditions optimal delegation involves 
randomization. 

7Borgers, Krahmer, and Strausz (2015) give an alternative and elegant proof of that result by 
showing that posted price mechanisms correspond to extreme points of the space of allocation 
functions. 

8We refer to Fang and Moro (2011) for an in-depth review of that literature, and to Onuchic 
(2022) for a review of recent theoretical contributions. 
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distribution of test scores is endogenous. We thus provide a microfoundation for 
Chan and Eyster’s restriction to monotone admission rules based on the agents’ 
investment incentive-compatibility. The selection mechanisms in the models of 
Chan and Eyster and Ray and Sethi are also deterministic. However, both papers 
recognize the possibility that color-blind affirmative action constraints can make 
the optimal rule non-monotone. To the best of our knowledge, Fryer, Loury, and 
Yuret (2007) and Fryer and Loury (2013) are the only papers to study models of 
optimal selection where applicants can undertake investments that really affect 
their types. Our model however, considers a more general form of investment 
technology with a continuum of skills. It is interesting to note that the optimal 
selection rule is also pass-fail in their settings. 

Our work also echoes an emerging literature at the intersection of economics 
and computer science, studying how to optimally design linear classifiers when 
the input features are manipulable by an agent. Hu, Immorlica, and Vaughan 
(2018), Ball (2019) and Frankel and Kartik (2022) study the optimal design of 
linear selection rules when the agent has the ability to falsify its type at some cost. 
More closely connected to our paper, Kleinberg and Raghavan (2020) study how to 
design a linear classifier so as to induce the agents to invest some effort to improve 
their outcomes as opposed to gaming the classifier. 

On the methodological front, we solve a problem of calculus of variations which 
shares a similar structure to convexity-constrained variational problems arising in 
monopolistic screening models (see, e.g., Rochet and Choné, 1998; Carlier, 2001; 
Manelli and Vincent, 2007; Daskalakis, Deckelbaum, and Tzamos, 2017; Kleiner 
and Manelli, 2019; Bergemann and Strack, 2022). However, instead of being linear 
as in the previously mentioned papers, the objective functional of our variational 
program is convex. We follow a similar approach to Manelli and Vincent (2007) 
and Kleiner et al. (2021) by characterizing an optimal solution among the extreme 
points of a compact convex functional space. 


1.2. MopDEL 


A principal (designer, she) has unit mass of resources to allocate to a unit mass 
of agents.? The agents can undertake investments resulting in a new type that 
is observed by the designer. The designer commits ex-ante to an selection rule 
contingent on the agents’ final types. Her value from selecting an agent is his final 
type, and her outside option is to not allocate the good at all in which case she gets 


°We show in section 1.5 that our main result still holds when the designer is capacity constrained. 
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a null payoff. The agents only care about getting the good independently of their 
types and their investment cost is increasing and convex in the type improvement. 


Types. There is a continuum of agents characterized by a type 6 € © = [0,6] CR. 
We set 6 < 0 < @. The total mass of agents is normalized to one. Types are 
distributed according to the cumulative distribution function F: © — [0,1]. We 


assume that F admits a density function f: © — R which is strictly positive and 


continuously differentiable on the support ©. 


Investments. The agents can transform their types at some cost. Acquiring a 


final type t € T = R entails a cost yc(t, 9) to an agent with initial type 6, where 


y € R, isa scaling parameter, and: 


o(t,6) = {0, (t — ayy" 
2 

for all (t, 0) € T x ©. Under this specification for the cost function, the transformed 
type can indeed be regarded as the outcome of an investment. The cost for agents 
to acquire a new type is non-negative only if it is higher than their initial type, and 
is increasing and convex as a function of the type increase. Moreover, the cost 
exhibits decreasing differences, i.e., c(t’, 0’) — c(t, 0’) < c(t’, @) — c(t, 6) for any 
t’ > tand@’ > 9, with strict inequality so long as t > 6.!° This property of the cost 
function captures heterogeneity in the investment ability of agents. It is marginally 
costlier for an agent whose initial type is low to acquire a high final type than for 
an agent whose initial type is already high. 


Payoffs. The designer can choose to allocate or not the resource to each agent. 
Her allocation choice is denoted a € A = {0, 1} and her payoff function is given by 
v(a,t) = at for any (a,t) € A xT. That is, the designer’s payoff from allocating 
the resource to an agent with final type ¢ is normalized to t while her payoff from 
not allocating the resource is set to zero." All agents have the same preference over 
allocations. They receive a payoff normalized to one upon allocation, and get zero 
otherwise. The payoff of an agent with initial type 6 is thus given by the allocation 
choice of the designer net of the investment cost u(a, t,@) = a — yc(t, 8). 


10Such a specification for the cost function is also present in the signaling models of Frankel and 
Kartik (2019, 2022) and Ball (2019) but result in a different interpretation. In their papers, the type 
of agents is multidimensional. The first dimension @ is identified with the agents’ “natural” ability 
while the second dimension y captures their ability to “game” the designer’s selection rule. 

4Setting the non-allocation payoff to zero is equivalent to saying that the designer has no intrinsic 
utility for the good. 
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Mechanisms and incentive-compatibility. The designer cannot observe 6 but 
knows F and can observe the final type ¢ resulting from each agents’ investment. 
She commits ex-ante to an selection rule 7: T — [0,1] specifying the probability 
of selecting an agent with final type rt. After observing the selection rule o, agents 
choose an investment rule t: © — T. Any pair (a, T) is called a mechanism. We 
say that an investment rule is incentive-compatible if it maximizes the probability 
of allocation net of investment costs for all initial types. 


Definition 1 (Incentive-compatibility). An investment rule t: © — T is incentive- 
compatible under the selection rule 7: T — [0, 1] if: 


T(0) € argmax o(t) — yc(t, 9), (IC) 
teT 


forall@ € ©. 


We say that an investment rule tT is implementable if there exists an selection 
rule o under which T is incentive-compatible. 


Timing. The timing of the game is the following: 


(i) Allocation rule: The designer commits to an selection rule 0 which is 
publicly observed. 


(ii) Types: The agents’ types are drawn according to the cumulative distribution 
function F’. 


(111) Investments: Each agent privately observes its type @ and undertakes an 
investment taking into account its cost yc(t, @) as well as the selection rule 
o. The investment results in a new type T(6). 


(iv) Outcome and payoffs: The designer observes T(@). The mechanism (c, T) 
generates an outcome x(@) = o(1(@)) specifying all agents’ allocation 
probabilities given their investments, and payoffs are realized. 


Design problem. Given a mechanism (c,7T) and the density function f, the ex- 
ante expected payoff of the designer is given by the expected final type conditional 
on allocation: 


6 
We -/ 7(8) o(t(6)) f(0) a0. 
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The problem of the designer consists in maximizing the expected final type among 
selected agents, taking into account the agents’ investment incentives. Formally, 
this corresponds to the following optimization program: 


maximize V(o, T) subject to (IC). (P) 
O.T 


1.3. MAIN RESULTS 


We start by distinguishing deterministic from random mechanisms and showing 
that we can restrict our analysis without loss of generality to the class of monotone 
selection rules. Pass-fail rules correspond to the class of monotone and deterministic 
selection rules. We then formulate assumptions on the designer’s preferences and 
the magnitude of the agents’ costs, and state our main result. We also explore some 
properties of the optimal mechanism. 


1.3.1. Optimality of pass-fail selection rules 


Deterministic vs. random allocations. An selection rule is called deterministic 
if it allocates the resource to a strictly positive measure of types with probability 
one and excludes others for sure. Conversely, an selection rule is random if it is 
not deterministic, i.e., the resource is allocated with an interior probability for a 
non-negligible measure of types. We give the formal definition below. 


Definition 2. An selection rule o is deterministic if 7 (t) € {0,1} for almost all 
t € T. An selection rule is random if there exists a (Lebesgue) measurable subset 
T CT with strictly positive measure such that 0 < a(t) < 1 forallt €T. 


Restriction to monotone allocations. We show in the next lemma that we can 
restrict our analysis without loss of generality to monotone increasing selection 
rules assigning zero probability to strictly negative types. 


Lemma 1 (Monotone selection rules). One can, without loss of generality, restrict 
attention to mechanisms such that: 


(i) a(t) =O for all t € |—00, O[, and; 
(ii) o is non-decreasing on |0, +co[. 
Any selection rule satisfying properties (i) and (ii) is called monotone. 


Proof. See appendix A.2.1 Oo 
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The first property is implied by optimality. Indeed, the designer always loses 
from selecting agents with a type below her preference threshold. Monotonicity 
is an implication of (IC). Since the investment cost function satisfies decreasing 
differences, any incentive-compatible investment rule must be increasing. Moreover, 
the cost function is non-decreasing in the final type. As a result, letting the selection 
rule o be decreasing on some interval would necessarily violate (IC). 


Pass-fail mechanisms. An selection rule is pass-fail if there exists a cutoff t* 
above which all agents with a final type greater than tf’ obtain the resource with 
probability one, and all agents whose final types are strictly less than ¢’ obtain the 
resource with probability zero. Here is the formal definition. 


Definition 3. Let t' € T. An selection rule o is a t'-pass-fail rule if: 
esis ?f\, 


jforanyt €T. 


Let o be a pass-fail rule with allocation cutoff t?. Observe that varying the 
cutoff ¢? from zero to infinity (essentially) spans the entire family of monotone and 
deterministic selection rules. 

We now show that the investment rule implemented by a pass-fail selection 
rule is essentially unique. Let 6(t') be the initial type @ € © solving the equation 
yc(t',@) = 1 when it exists. All agents with initial types given by @(t’) are thus 
indifferent between keeping type @(t') at zero cost and acquiring the final type ¢* at 
a cost equal to 1. Given the quadratic form of the cost function the threshold 6(t") 
is equal to t’ — V2/y whenever @ + 2/y <t'<6+4 V2/y and we set 6(t') = @ 
whenever t' < 9+ V2/y. First, agents whose initial type is below @(f") have a cost 
which is too high to acquire the final type ¢’ and, as a result, are rejected by the 
designer and keep their initial type at zero cost. Agents with a type in between 
6(t*) and f*, in turn, all choose to invest in the minimal final type type guaranteeing 
admission with probability one, which corresponds exactly to t’. Finally, the 
agents whose initial types are above ¢’ are approved with probability one at zero 
cost. Therefore, the following investment rule corresponds to the the (essentially) 
unique investment rule that is implementable by a t'-pass-fail selection rule: 

Therefore, the following investment rule (illustrated on figure 1.1) corresponds 
to the the (essentially) unique investment rule that is implementable by a ¢’ -pass-fail 
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(a) tt < @+/2/y. (b) 0+ V2/y < ti <4. 
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Figure 1.1: Incentive-compatible investment under a f’-pass-fail rule. 


selection rule: 
6 if@e [6,a(t')[ 
ria) =4 tt ifee [a(r*), 7] (1.1) 
6 if@e [t', 6] 


if t* < 0, and 


(1.2) 


«| 0 if@e€ [8,0(t")[ 
7 =|" if  € [0(t*), A 


if t' > 6. A t'-pass-fail rule thus segments the population of agents in two 
categories: the agents who keep their initial types at no cost and those who bunch 
at the allocation cutoff t’. Any mechanism (o-, T) such that o is a f'-pass-fail rule 
and T is either given by equation (1.1) if t’ < @ or by equation (1.2) if t* > @ is 
called a t’-pass-fail mechanism. 


Assumptions. Let us start by defining 69 = min{@ € @|c(0,@) < 1}. The 
type 80 corresponds to the lowest initial type for which investing at the designer 
preference threshold would be feasible if the good were allocated with probability 
one. Given the quadratic form of the cost, we have the closed-form solution 
09 = —2/y. Any agent with an initial type strictly lower than 69 thus has too 
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high a cost to reach any type above the designer’s preference threshold under any 
monotone selection rule. Therefore, any agents with an initial type below @9 must 
be rejected with probability one under the optimal allocation. We introduce our 
first assumption, which concerns the designer’s preference threshold. 


Assumption 1. The density function f is non-increasing on the interval [6o, 0. 


This assumption has two implications. First, the density function is decreasing 
on the right tail of the type distribution. Second, it implies that the designer’s 
preference threshold lies sufficiently to the right of that tail, so that the density 


function is non-increasing over the interval [6o, 6]. We also make an assumption 
on the magnitude of the agents’ investment costs. 


Assumption 2. The scaling parameter y satisfies y > 1/c(0, @). 


This assumption clarifies the exposition of our results by eliminating cases 
where the magnitude of the investment costs would be so small that the designer 
would allocate the resource to all agents under the optimal rule. However, it is not 
necessary to prove our main result. Formally, this assumption ensures that 0 is 
bounded away from @, so we can exclude all the types in the interval [9, 69[ from 
our subsequent analysis. 


Optimal pass-fail rule. Optimizing the designer’s payoff within the class of 
pass-fail mechanisms is much simpler than our original problem (P). Instead 
of solving an infinite dimensional program it reduces to the selection of a one 
dimensional allocation cutoff t’. In the next proposition we characterize the optimal 
mechanism within the class of pass-fail mechanisms. 


Proposition 1. Let t) be the unique solution to the equation y(t) = 0, where: 


_ FQ) - F(@@) 


fom eel 
_ 1 ~ F(A(t)) 
=F (Gr = = 
Fem) TEES FNRI 


Then, the ty-pass-fail mechanism, denoted (oy, 7,), is optimal in the class of 
pass-fail mechanisms. Moreover, under assumption I and 2, the optimal allocation 
cutoff t,, belongs to the interval |0, 2/y [ , and the indifferent type 6, = @(t,) 
belongs to |60,0[ , for any y > 1/c(0, @). 


Proof. See appendix A.1. Oo 
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(a) Optimal pass-fail selection rule. 


ty (9) 


(b) Implemented investment. 


Figure 1.2: The optimal pass-fail allocation when f(@) = pe-?"/(e-?2 —e 8) 
with p = 2, andy = 1. 


We illustrate the optimal pass-fail mechanism in figure 1.2 in the case of 
exponentially distributed types and a cost scaling factor normalized to one. Agents 
with an initial type given by 6), are indifferent between investing at the cutoff tr; 
and staying at 0). All agents below do not invest because they have too high a cost 
to reach ty and are excluded by the designer. Agents in the interval [@), t,,] bunch 
at the allocation threshold, since it is the least costly final type that guarantees to 
be allocated the good with probability one. Agents with an initial type above r,,, in 
turn, are already guaranteed to be allocated the good without any investment and 
thus keep their initial types at no cost. 

The optimal allocation cutoff r; is strictly above the preference threshold of the 
designer. It is therefore optimal for the designer to commit to rejecting types above 
its preference threshold with probability one. This commitment on the part of the 
designer benefits him by encouraging a sufficiently large mass of agents to invest 
in a strictly positive type in equilibrium. 


We show that the 
mechanism exhibited in proposition | is not only optimal in the class of pass-fail 


Optimal selection rule. We now state our main result. 


mechanisms but solves the designer’s program (P). 
Theorem 1. /fassumption J is satisfied, then (oy, T,) is a solution to (P). 


We defer the proof of theorem | to section 1.4. The main intuition for the result 
is the following. Any optimal mechanism must balance two conflicting forces 
acting on the designer’s expected payoff. On the one hand, random monotone 
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Figure 1.3: First-best (dashed lines) vs. second-best investment rule (plain lines) 


selection rules incite agents’ with initial types already above the selection cutoff to 
invest in higher final types, which benefits the designer. On the other hand, it also 
increases the probability to unduly reject some agents having invested in a final 
types above the designer’s preference threshold, and decreases the mass of agents 
bunching at the selection cutoff. These two effects harm the designer’s expected 
payoff. theorem 1 establishes that under assumption | the negative effect is always 
the strongest. 


1.3.2. Properties of the optimal allocation 


In this section, we describe the properties of the optimal selection rule. We first 
show that the optimal allocation is induces rationing compared to the first-best 
solution. Next, we perform comparative statics with respect to the cost scaling 
parameter y. 


Comparison to the first-best solution. We emphasized that it is optimal for the 
designer to commit ex-ante to rejecting types in the interval [0, t}]. In comparison, 
if the designer could observe the initial type of the agents, she could implement the 
first-best optimum by rejecting agents that do not invest in the maximum feasible 
final type given their cost, given by ¢(9) = max{t € T|yc(t,@) < 1} =@+ 2/y. 
The designer would thus admit all agents whose initial type lies in the interval 
[@0, 6] and implement their maximum possible level of investment. As a result, all 
the agents in the interval [4, 0} are rationed under the second-best. Moreover, 
the designer has to forgo the value f (7(@) — t,(0)) f(@) dé due to the decrease in 
investments. This discrepancy is represented on figure 1.3. 
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Comparative statics. We first establish a preliminary result. We prove that if the 
distribution of types is sufficiently decreasing, then the optimal allocation cutoff 
never exceeds the highest initial type whatever the magnitude of investment costs. 


Lemma 2. If f(@) > 1/0, then the optimal cutoff t,, belongs to the interval [0, 6] 
foranyy > 0. 


Proof. See appendix A.3.1. Oo 


The condition f(@) > 1/0 ensures that a sufficiently high mass of agents is 
concentrated at the bottom of the type distribution and thus guarantees that r,, is 
bounded above by @ for any y > 0. The reason for this is that the mass effect 
always dominates the incentive effect for the designer when the type distribution 
is very decreasing. In other words, fixing a threshold of approval higher than 6 
would always exclude a too large mass of low type agents compared to the gain in 
terms of type investment at the cutoff ry. 

We conduct the comparative statics exercise under the assumption that f(@) > 
1/6 in the main text. This assumption is made for readability and our results could 
easily be extended when it is not satisfied. The designer’s optimal expected payoff 
is given by 


6 
Vi = th(F(tt) — F(6t)) +f 0 f (0) do 


Moreover, the agents’ interim optimal payoff is given by: 


0 if 6 € [6,6%| 
U,(0)=4 l-ye(t},@) ifde [6,01 ; 
1 if 0 € [1,0] 


First, we show that the designer’s optimal allocation cutoff becomes looser as the 
investment costs increase. 


Proposition 2. The optimal cutoff t, defined in proposition 1 is monotonically 
decreasing towards zero as y increases, while the minimal initial type being 
admitted 0, is monotonically increasing towards zero as y increases. Moreover, 
the cutoffs t, and @), are both converging to zero as y tends to infinity. 


Proof. See appendix A.3.2. Oo 


The insight for proposition 2 is as follows. Under the optimal mechanism, the 
agents with initial type 6}, are indifferent between being rejected and keeping their 
initial type at cost yc(@,, 0)) = 0, or investing in a the final type type 7), at a cost 
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yc(ty, 6) = 1. Increasing the magnitude of the investment cost y has the effect 
of increasing the marginal type @), thereby excluding a larger mass of agents with 
low initial types. The optimal response of the designer to mitigate this reduction 
in the mass of approved agents at the bottom is to lower the admission threshold 
to restore higher investment incentives for those agents. proposition 2 has the 
following direct consequence: When the cost of investment becomes arbitrarily 
large for all agents, the optimal mechanism is for no agent to invest in a new type 
and for the designer to approve only positive initial types. 


Corollary 1 (Asymptotics). The optimal experiment and optimal outcome respec- 
tively converges to 1{t > 0} and to 1{@ = 0} as y — +00. Accordingly, the 
optimal payoffs converge to: 
6 
Vy — Of (@) dé, 
yoo Jo 


and: 
UO), = Ne FO); 


y— +00 


Corollary 1 confirms that when the investment cost becomes infinitely large for 
all agents, the optimal mechanism is the one that would be optimal if the designer 
could only use an approval rule, that is, if he solved the following problem: 


6 

max 00-(0) f (@) dé. 
oa: ©->[0,1] | ( FC ) 

Proposition 2 allows us to derive comparative statics on the designer’s welfare as 

well as on the agents’ welfare when y increases. 


Proposition 3 (Comparative statics). Under the optimal mechanism (ay, T,) the 
following claims are satisfied: 


(i) The designer’s welfare V, is decreasing in y. 


* 


(ii) There exists a cutoff Oy € ]@,,t[ such that the interim welfare of agents 
U,(@) is decreasing for all @ € [6;, 6,] and increasing for all 0 € [6y, |: 


Quite naturally, proposition 3 establishes that the designer’s payoff decreases 
as the investment becomes costlier. First, the allocation cutoff decreases as y 
increases, which implies that agents who bunch at the threshold invest in a lower 
final type type in equilibrium. Moreover, the minimal type being approved 6, 
increases as y grows. Hence, the total mass of agents being approved under the 
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optimal mechanism decreases as the cost of investment grows. Both effects lower 
the optimal payoff of the designer. 

An upward scaling in the investment cost, however, has a non monotone effect 
on the agents’ interim payoff. Indeed, the change in the agents’ interim welfare 
due to a marginal increase in the investment cost can be decomposed in two effects 
which go in opposite directions. First, the direct effect of an increase in y is to scale 
up the cost of investment for all types. Second, the indirect effect of an increase 
in y is to lower the admission standard rt, which decreases the distance from any 
@ to the allocation cutoff t. This decreases the cost of investment for all agents. 
proposition 3 shows that there exists a type threshold @, above which all agents are 
gaining from an increase in y and below which all types are losing. The reason 
is that the indirect effect is stronger than the direct effect for all types which are 
sufficiently close to the allocation cutoff r,. 

Comparative statics with respect to the welfare of agents turns out to be more 
intricate. Formally, the agents’ aggregate welfare is given by: 


t 

Ut= 1-F(6r) ~y |” e(t;,6) (8) a9. 
o* 
—~x~”_——”’” Y 

ex-ante allocation. 


probability aggregate investment cost 


The effect of y on agents’ welfare is ambiguous. The direct effect of an increase in 
y 1s also to scale up the investment cost for all types of agents. Increasing y also 
makes 6), increase, so the ex-ante allocation probability is decreasing. However, 
an increase in y also decreases the allocation cutoff t;,. This decreases the ex-ante 
probability of incurring the cost for agents at the top, and decreases the cost from 
investing to the cutoff for bunching agents. Depending on which of the effects 
dominate, the welfare of agents might increase or decrease. This suggests that the 
welfare of agents is not monotonic as a function of the magnitude of the investment 
costs. 


1.4. PROOF OF THEOREM 1| 


In this section we state the main steps of the proof for theorem 1. Additional 
proofs can be found in appendix A.2. We first investigate the consequences of 
the monotonicity constraint imposed by lemma | for implementable investment 
rules. Next, we provide a characterization of incentive-compatible mechanisms in 
terms of a transformation of the agents’ indirect utility function that we call the 
pseudo-utility function. Thanks to this characterization, we show that the problem 
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of the designer can be restated as a problem of calculus of variations where the 
optimization variable is the agents’ pseudo-utility function. We prove that the 
objective functional of this variational program is an upper-semicontinuous convex 
functional on a compact and convex set. As a result, there must exist some extreme 
point of the domain that is a solution of the designer’s problem. We provide 
necessary conditions on the shape of these extreme points, which, together with 
the tangent inequality for convex functionals, allows us to establish the optimality 
of the pass-fail selection rules. 


Admissible mechanisms: definition. Lemma 1 sharpens the set of the investment 
rules that can be implemented by the designer. When the selection rule is monotone, 
no agent ever invests in a type that is strictly lower than its initial type, since this 
could only decrease its allocation probability. Therefore, we must have T(@) > 0 
for any 6 € ©. We now show that there also exists an upper bound on agents’ 
investments. Let us define 7(@) = max{t € T|yc(t,@) < 1}. The type 7(@) 
corresponds to the maximal type an agent with initial type @ would be willing to 
invest in, if he were allocated the good for sure. Since c(-, @) is a continuous and 
non-decreasing function over T for any 6 € ©, the upper bound f(@) is defined 
implicitly by the following equation of f: 


yc(t,@) = 1, (UB) 


which has a closed-form solution given by ¢(@) = 6 + /2/y for any @ € ©. Hence, 
we always have 6 < T(0) < 7(@) for all 6 € © and we can normalize T to [0, 7(@)| 
without loss of generality. 

Any monotone selection rule must also induce agents approved with non-zero 
probability to acquire a non-negative final type. Let o be a monotone selection 
rule and let t* be the lowest final type type guaranteeing a strictly positive approval 
probability to the agents under co, 1.e.: 


ti =min{t €T|o(t) > O}. 


Since o is monotone, ft’ must be non-negative. Let 6* be the initial type defined as 
the solution to the following equation of @: 


o(t') =yc(t',@). 


a7 


Denoting a(t’) by o', we obtain: 


: 207 
gi =r — \/—. 


he 


Clearly, 9" corresponds to the lowest type investing in a non-zero final type under 
the selection rule a. In particular, an agent with type 6° is always indifferent 
between keeping type 6’ and acquiring final type t’. Hence, we always have 
T(@) = @ for all @ € [@0, @"[ and r(@) > t’ > 0 for all 6 € [6°, 6]. Remark that, 
differently from 0, the cutoff 6", together with the probability o, are part of the 
design and thus endogenous. 

Third, let (0,7) be a mechanism such that o is monotone and assume that 
there exists a type 6 such that t(@) = f(@). Since T satisfies (IC), it must be 
non-decreasing by Topkis’ theorem. Therefore, there must exist a type 0 € © 
such that 6 = min{@ € @|7(@) = 7(@)}. The condition (UB) then implies that 
a (f(@)) = 1. Hence, since o is increasing and bounded above by 1, it must also 
be that o(t) = 1 for all t > 7(@). As a consequence, the final type #(@) must be 
the smallest needed to pass with probability 1 under selection rule o. Firstly, it 
might be the case that #(@) < @. Then, since T satisfies (IC), all agents born with 
types in the interval [6,7(@)[ must invest exactly at the threshold 7(@) whereas all 
agents born with types in [7(@), 0] are already guaranteed to pass with probability 
1 so t(@) = @ for all 9 > 7(@). Conversely, it might be the case that 7(9) > 6. This 
means that no initial type is initially guaranteed to pass with certain probability 
under o- and, again by (IC), all agents born with types in between [6, 0] must invest 
at 7(@). We sum up the previous discussion in the following corollary. 

Corollary 2. [fo satisfies properties (i) and (ii) from lemma 1, then: 


(i) 0< T(@) < £(8@) forall @ € [6, 4]; 


(ii) there always exists 6" € [09,0] such that t(0) = 6 for all 6 € [6, @"[ and 
t(@) > O forall @ € [6", 6], and; 


(iii) if there exists @ € [@0, 4] such that t(@) = (0) then we have the following: 


(a) if t(@) < @, then t(@) = (6) for all @ € [6,#(6)[ and t(0) = @ for all 
0 € [7(8), A]; 
(b) if t(@) > 6, then t(@) = 7(@) for all @ € [#(@), 6]. 
Accordingly, we say that any mechanism (c,,T) satisfying (IC) as well as all 


properties from lemma 1 and corollary 2 is admissible. 


58 


Admissible mechanisms: characterization. The indirect utility of an agent of 
type 6 under some selection rule o is given by 


U(@) = max o(t) — yc(t, @). 


Given the quadratic form of the cost function, simple algebra on the agents’ 
objective function reveals that 


2 2 
vor frate-(5-S))f-z)) 


Accordingly, we say that the value function defined by 


r t 
u(@) = max té — (5 - a : 
teT 2 Y 
is the agent’s pseudo-utility function under the rule a. Let u be the pseudo-utility 
function induced by the selection rule 7 (t) = 0 and a be the pseudo-utility function 
induced by the selection rule @(t) = 1{t > O}.!2 We have: 


@2 

— 

u(@) 5 

and , 
— if 0 € [60, O[ 
aO)=) 5 9 : 
—+— ifde [0,6] 
yo 2 


The next proposition characterizes admissible mechanisms in terms of properties 
of the pairs (u,T). 


Lemma 3. A mechanism (o,T) is admissible if, and only if, the pseudo-utility 
function u and investment t induced by o satisfy the following properties: 


(i) u is a convex function over [@9, 6] and is hence differentiable almost every- 
where in that interval. Moreover, the envelope formula u’(@) = T(@) must 
hold at all points where u is differentiable; 


(ii) u(@) < u(@) < u(@), forall 6 € [4,6]; 


(iii) @ < u’(@) < t(@) for almost all 6 € [6o, 6]; 


Remark that 0 and & are respectively the lowest and greatest recommendation rules in the 
class of monotone selection rules. 


59 


(iv) u(@) +u’(@)*/2 — 6u’(0) < 1/y for almost all 6 € [8o, 6]; 


(v) There always exists 6’ € [@9,0] such that u(@) = u(@) for all 6 € [@, 6"| 
and u is increasing and strictly above u(@) for all @ € |6", 6]; 


(vi) If there exists 6 € [09,6] such that u’(@) = t(@), then, if 7(8) < 6 we must 


have: 
u(8) if 0 € [6, 6 
u(9) =4 u(6)+7(6)(0-6) ifoe [6,7(6)] . 
ai(0) if 6 € (6), 6] 
and if (9) > 6 then: 
(a) = 1 if 0 € [60, O[ 
= 1 (0) = u(6) +7(6)(0 — 6) if € [6,8] 


Proof. See appendix A.2.2 Oo 


Property (1) is a consequence of Proposition 2 in Rochet (1987) and ensures 
that the pair (u,T) is consistent with the first-order conditions of the agents’ 
optimization program wherever o is differentiable.* By the envelope formula, the 
function wu’ is non-decreasing, hence the function u must be convex. Another way 
to see why u must be convex is the following fact. If we let y(t) = t7/2 — o(t)/y, 
then: 

u(@) = max Ot — y(t). 
teT 


The function wu thus corresponds to the Legendre-Fenchel transform of the function 
y, which is convex by definition. The previous expression also highlights the fact 
that u corresponds to the indirect utility an agent with quasilinear payoff would get 
in a monopolistic screening problem (Mussa and Rosen, 1978) under some tariff vy. 
Note, however, that unlike standard screening problems, the function y must be 
bounded between g(t) = t?/2 —a(t)/y and G(t) = t7/2 — &(t)/y because of the 
probability constraint on o and of lemma |. This implies property (11). Property 
(111), in turn, is simply the translation in terms of the pseudo-utility function of 
corollary 2. Let us now explain property (iv). For this, we remind first that the 


outcome x: [09,0] — [0,1] of the selection rule o is given by x(@) = a (T(@)) 


for all 6 € [@9, 0]. Using property (i), we can rewrite the outcome as a function of 
u and u’ by substituting T(@) by u’(@) almost everywhere. Indeed, remembering 


BWe know that, as a non-decreasing function, 0 must be differentiable on T except maybe on a 
countable subset of points. 
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(a) An admissible pseudo-utility w. (b) Its induced investment rule uw’. 


Figure 1.4: An admissible pseudo-utility function u € U. Its induced investment 
rule is given by its derivative u’. Here u(@) = 0.45x(0+0.4)*+0.1x(0+0.4) +0.08. 


that U(@) = y(u(@) — 67/2) and remarking that 7 (t(@)) = U(@) + yc(t(@), 0) we 
obtain: ae 
x(0) = y |u(@) + ey — 6u'(0)], 


for almost every 6 € [99,0]. Thus, property (iv) expresses the fact that x must 
be bounded above by 1. Importantly, it is easy to show that for any u satisfying 
properties (ii) and (iii), we have x(@) > 0 almost everywhere. Nevertheless, we 
need to add property (iv) in the characterization as there exists some functions 


satisfying (11) and (iii) such that x(@) > 1 for some 6 in [6 , 6]. Finally, properties 
(v) and (vi) are direct consequences of corollary 2. 


Variational program. Using the previous characterization, we can first define 
the following set. 


Definition 4. The feasible set of pseudo-utility functions is defined by: 


us {u: [00,0] ~R | u satisfies properties (ii) to (vi) from lemma 3} . 


Any u € U can be generated by some admissible mechanism (co, T) and any 
admissible mechanism (co, 7) induces some pseudo-utility u € U/. We thus refer 
to any u € U as an admissible pseudo-utility. For the purpose of illustration, an 
instance of admissible pseudo-utility u € U/ is depicted on figure 1.4a while its 
induced investment rule, given by its derivative, is depicted on figure 1.4b. Second, 
we can also rewrite the integrand of the designer’s objective as a function of the 
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agents’ pseudo-utility by substituting 7(@) by u’(@) and 0 (t(@)) by x(@) almost 
everywhere: 


u’ (0)? 


; — 6u’(@)]}, 


£(0) o(e(@)) = yu"(@) (u(0) + 
for almost every @ € [00,6]. Therefore, for any pseudo-utility function u € U/, the 
designer’s payoff is given by: 


6 
V(w) J) A(0,u(6),w’(8)) 46. 


where, for any 6 € [60, 6], the function A(Q, -, -) is defined by 


i 
y 
A(6,x,y) = vy (: ie iy f (8) 
for any (x, y) € [u(@), u7()] x [0,t(@)]. The problem of the designer can therefore 
be expressed as the following problem of calculus of variations*: 


6 
max [ A(0,u(6), u’(8)) 46 (V) 


Unfortunately, we cannot apply the standard resolution methods for program (V) 
because the objective functional does not satisfy the necessary conditions to use 
the first-order approach (see, for instance Clarke, 2013, Chapter 14, Section 1). 
Therefore, we have to resort to different techniques to solve it. 


Parameterization. First of all, we parametrize the problem (V) with respect to 
6’, the type whereupon the function u starts taking-off with a non-negative slope 
from the lower bound u. Consider the following set of functions: 


U(6") = {u: [0", 6] > R|w is convex and increasing, 
u(@) <u(@) < u(6), 
6 <u'(@) < 7(6), 
u(0) + u'(@)*/2 — @u'(@) < 1/y, 
u(o") = u(6")| 


4We refer to Clarke (2013) for a thorough treatment of calculus of variations. 
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for any 6° € [6,0]. Remark that for any u € U/(@") we have A(@, u(6), u’(@)) = 0 
for all 6 € [49, 6"] and A(@, u(@), u’(@)) > O for all 6 € 16", 6]. We can therefore 
always integrate the objective starting from 6‘. Accordingly, we define the 
parameterized objective functional V»; as follows: 


6 
Gs [ A(6, u(6), u’(0)) dd. 
6 
Restriction to extreme points of 7/(@"). We first prove the two following crucial 
observations. 


Lemma 4. For any 6" € [60,4], the set U(@") is convex and is compact with 


respect to the supremum-norm. 


Proof. See appendix A.2.3. Oo 


Lemma 5. For any 6‘ € [6", 6], the functional Vp: : U(@") > R is upper semicon- 


tinuous. Moreover, if assumption I is satisfied, then it is also convex. 
Proof. See appendix A.2.4. Oo 


Lemmas 4 and 5 have the following implications: First, by the Krein—Milman 
theorem (Aliprantis and Border, 2006, Theorem 7.68) it must be that the set 2/(6") 
is the closed, convex hull of its extreme points and, in particular, that the set €(6") 
of extreme points of 2/(@") is non-empty. Second, Bauer’s Maximum Principle 
(Aliprantis and Border, 2006, Theorem 7.69) ensures that the functional Vg; must 
admit a maximizer which belongs to €(6"). 

A function u € U/(6") is an extreme point if there does not exist u,,u2 € U and 
a € 0, 1[ such that u = au; + (1 —@)u2. We provide next an equivalent and more 
convenient definition of extreme points. 


Definition 5. Let C be a compact and convex subset of a locally convex topological 
vector space X. A point x € C is an extreme point if, and only if, for every direction 
h € X such that h # 0, we either have thatx +h ¢Corx-—hé€C, or both. 


We proved in lemma 4 that 2/(@") is a compact and convex subset of the normed 
linear space (C([6*, 4]), ||-||.o). Therefore, a function u belongs to €(6") if, and 
only if, there does not exist any direction h € C([", @]) such that u — h € U and 
u+h €U. In the next lemma, we provide necessary conditions on the extreme 
points of U/(6"). 


Lemma 6. Jf u € U(6") is an extreme point then: 
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(a) An extreme point uw. (b) Its induced investment rule wu’. 


Figure 1.5: An extreme point u € €(6") together with its induced investment rule 


u’. 


(i) u must increasing and piecewise affine on the interval [6*, 0], i.e., there exists 


b; € R for alli € I, and that u(@) = max{a;0 +b; |i € I,ao0" + bo = u(6")} 
for all [6", O[; 


a countable set I and a sequence (aj, b;)jer such that 0 < a; < dj; and 


(ii) u must be a convex connection between increasing affine and quadratic arcs 
on the interval \0, @], i.e., there exists a countable set J, a sequence (cj) jes 
such that u(0) < cj < Cj41 < 1/y for any j € J, as well as a collection of 
intervals ([x;,X;[) jez such that [x;,X;[ © ]0, 4], that x; < xj+1 forall j € J, 
and that u(@) = 67/2 + c; for every j € J and any @ € [x;,x;|. Moreover, 
whenever 6 € ]0, 4] \ Ujey[x;,%;[L u must be increasing and piecewise affine, 
and must join two quadratic arcs while staying convex. 


Proof. See appendix A.2.5 Oo 


Lemma 6 provides necessary conditions on the extreme points. It shows that 
if u € €(6"), then it must be the convex junction between increasing affine and 
quadratic arcs. An example of extreme point is depicted on figure |.5a together with 
its derivative on figure 1.5b. The reason why extreme points take this particular 
form is that such curves either make the bounds on the derivative binding when u 
is confounded with @ or ¢(@), or make the convexity constraint binding when w’ is 
flat, i.e., wu is affine. However, due to the constraint that u’(@) > 0, the function u 
cannot be quadratic if 6 < 0 because otherwise one would have u’(@) = 6 < 0. 

Lemma 6 has an important implication. When searching for an optimal solution 
of (V), one can restrict attention without loss of optimality to functions that are 
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convex connections of linear and quadratic arcs. Let us define the function u* 
given by: 
+6) u(@') +£(6")(9-6") if 6 € [8',7(0")[ 
u = ee. oa 
u(@) if 0 € [t(6"), | 
This function corresponds to the pseudo-utility function implemented by a 7(0")- 
pass-fail rule restricted to the interval [6', 4]. Indeed, remember that lemma 3 


implies that (u")/(@) corresponds to the optimal investment after the cutoff 6". It 
is easy to verify that: 


wre _ | 20") if € [0", £(6")[ 
as 0 if € [76"), 4] a 
if 0° < 8 — /2/y, and 
(u")’(@) = (6), (1.4) 


if e' > @- V2/y. This corresponds to the investment rule that would be 
implemented under a 7(6")-pass-fail rule. Importantly, the function u‘ satisfies 
all the necessary conditions in lemma 6 and is an extreme point of U/(6"). Also 
importantly, u’ bounds any other function u € U/(6") from above so u' — u > 0 for 
any u € U(6"). The final step of our proof is to show that w’ is an optimal solution. 
To do so, we first recall the following characterization of convex functionals. 


Lemma 7 (Above the tangent property for convex functions). Let X be a normed 


space, C be a non-empty closed convex subset of X and py: C — R be a Gdteaux 
differentiable function, with Gateaux derivative at x € C in direction h € X given 
by Dy(x)(h). Then, y is convex if, and only if: 


y(y) = v(x) + De(x)(y — x) 
for all (x,y) € CXC. 


By lemma 4, we know that the set 0/(6") is a compact and convex subset of the 
normed linear space (C([@", 4]), ||-||oo). Moreover, we also proved in lemma 5 that 
V,+ is convex over U/(@") under assumption 1. We now prove that the functional 
V,+ is Gateaux differentiable everywhere on U/(6") and we give a closed form for 
its Gateaux derivative. 


Lemma 8. The functional Vy; is Gateaux differentiable and has a Gateaux 
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derivative at u in direction h given by: 
6 1 
DV, (u) (A) = I | - [(uo + 50 (8) = ou'(o) + u'(0) (u’(@) — | f'(0) 


a (u’(@) — 2u’(8) (u’(0) — 8) —u’(0) (w”(0) — 1) Jro}ne dd. 


Proof. See appendix A.2.6. Oo 


Hence, lemma 7 and lemma 8 together imply that: 
Voi (v) = Voi (u) + DV: (u) (v - wv) (1.5) 


for any (u,v) € U(6") x U(6") and any 6" € [6,4]. In particular, equation (1.5) 
must be satisfied when v = u‘ and when u is an extreme point, since €(0") C U(6"). 
That is: 

Voi (u') > Vor (u) + DVgi(u)(u' — uw) (1.6) 


for any u € €(6"). We now prove the following important result. 
Lemma 9. /f assumption 1 is satisfied, then DVg:(u)(u' —u) > Oforanyu € €(6"). 
Proof. See appendix A.2.7. Oo 


Lemma 9 is the final step of our proof. Indeed, equation (1.6) and lemma 9 
together entail that Vy: (u") > Vp:(u) for any u € €(6"), proving the optimality of 
u'. The proof of lemma 9 relies on the fact that at any extreme point u € €(6"), we 
have either u”(@) = 0 and u’(@) = a for some constant a > 0 on intervals where 
u is linear, or u”(6) = 1 and u’(@) = @ on intervals where u is quadratic. This, 
together with the fact that u'(@) — u(@) > 0 for any @ € [4p, 6] implies that the 
integrand in the expression of the Gateaux derivative given in lemma 8 is always 
positive. Intuitively, this condition means that at a given cutoff 6‘, when we restrict 
to the extreme points of the domain U/(6"), deviating to the pseudo-utility u’ always 
locally increase the value of the principal’s objective. Note that any extreme point 
u € €(6") can be implemented by an increasing step selection rule. The linear 
parts of u then correspond to regions where agents bunch at the next step and the 
quadratic parts correspond to regions where agents have an investment cost too 
large to reach the next step and therefore keep their type at zero cost. lemma 9 
therefore basically states that among all step selection rules, the best one is the one 
with two steps. A step with value zero and a step with value one, separated by the 
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allocation cutoff #(6"). Optimizing the designer’s payoff over the one-dimensional 
parameter 6‘ ends the proof of theorem 1. 


1.5. EXTENSIONS 


We study three extensions of our model. First, we add a capacity constraint to the 
principal allocation problem. Second, we solve the problem of a utilitarian social 
planner. In both cases, the optimal selection rule remains pass-fail. Finally, we 
relax the principal’s commitment power. Instead of committing to a mechanism, 
the principal bases her allocation decision on the information provided by an 
intermediary with aligned preferences. We show that the principal-optimal 
allocation can be implemented by the intermediary through information design. 


1.5.1. Capacity constrained designer 


In the baseline model, the designer possesses the same mass of resources than 
the total mass of agents. In this extension, we assume instead that the designer is 
capacity constrained. That is, she can at most allocate a positive measure k < 1, 
of resources. Under that additional constraint, the problem of the designer writes 
as follows: 


6 
maximize i 1(6) o(r(0)) f(8) 49 


subject to 1(0) € argmax o(t) — yc(t, @) (IC) 
teT 
6 
and | o(r(0)) f(0) dd <x (C) 
6 


Letting A => 0 be the Lagrange multiplier on the constraint (C), we can observe that 
the previous problem reduces to the same problem than (P) where the preference 
threshold of the designer has been moved from 0 to 2: 


maximize [ow — A) o(t(9)) f(@) dé + Ak 
O~.T,. 6 


subject to 1(6) € argmax o(t) — yc(t, @) (IC) 


teT 


Since the structure of the problem is unchanged, we can use the proof of theorem 1 
verbatim on the proviso that the designer restrict herself to selection rules that are 
zero below the preference threshold A, and non-decreasing above. We thus have 
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the following result. 


Proposition 4. Pass-fail selection rules solve the capacity constrained allocation 
problem. Moreover, if the capacity constraint is binding, then the optimal threshold 
is tighter than under no capacity constraint. 


This proposition follows from the standard Kuhn and Tucker method, and we 
therefore omit its proof. The idea is as follows. If the solution to the constrained 
program is the same as in the case when xk = | then the multiplier 2 is null and 
the capacity constraint is not binding at the optimum. If the solution differs, then 
A > O and the capacity constraint kicks in. Whenever it is the case, the selection 
cutoff must, by definition, weakly increase compared to the unconstrained selection 
cutoff. Otherwise, it would mean that the designer allocates at least the same mass 
of resources than under the unconstrained program, a contradiction. 


1.5.2. Utilitarian welfare 


We now consider the problem of a utilitarian social planner seeking to maximize 
weighted social welfare. Formally, the planner’s program writes as follows: 


6 


maximize [ro o(r(8)) f(0) dO + a | (o(r(8)) — ye(r(4), 6) f(A) dd 


subject to 1(0) € argmax o(t) — yc(t, @) (IC) 
teT 
where a > 0 is the Pareto weight the planner assigns to the welfare of agents. 
When a < 1, the planner cares more about the welfare of the principal, and 
conversely when a > 1. We prove that the welfare-optimal selection rules 
are also deterministic, and have a smaller allocation threshold compared to the 
principal-optimal rule. 


Proposition 5. The welfare-optimal pass-fail rule solves the problem of the planner. 
Moreover, the welfare-optimal allocation cutoff is lower than under the optimal 
selection rule of the principal. 


Proof. See appendix A.4. Oo 


Proposition 5 states that taking into account the aggregate welfare of agents 
does not affect the deterministic structure of the optimal selection rule, and leads to 
less severe selection than under the principal-optimal selection rule. We illustrate 
the discrepancy between the welfare-optimal and the principal-optimal selection 
rules in figure 1.6. Naturally, taking into account the investment costs of agents 
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(a) Optimal pass-fail selection rule. (b) Implemented investment. 


Figure 1.6: The welfare-optimal pass-fail allocation (dashed lines) vs. the principal- 
optimal allocation (plain lines), when f(@) = pe~??/(e-?2 — e-*) with p = 0.2, 
y=l,anda=1. 


pushes the welfare-optimal allocation cutoff to the left of the principal-optimal 
one. 

Unlike proposition 4, the proof of proposition 5 requires some adaptations 
compared to the proof of theorem 1. The reason is that the planner cannot restrict 
itself without loss to monotonic mechanisms as described in lemma 1. Indeed, the 
planner, unlike the principal, may have an interest in accepting negative types if the 
Pareto weight a is high enough. Nevertheless, the constraint (IC) still guarantees 
that the selection rule o is a non-decreasing function over the domain T. The 
proof relies only on adapting the characterization of implementable pseudo-utility 
functions under such allocation functions, but otherwise works in the same way 
as the proof of theorem 1. Indeed, when the planner’s program is rewritten in 
variational form, the agents’ welfare term is linear in the pseudo-utility, thus not 
affecting the convexity of the objective functional. 


1.5.3. Implementation through information design 


In this section, we consider a slightly modified setup. There are three players: the 
receiver, the sender and the agents. The receiver decides whether to allocate the 
resource to the agents, a € A = {0,1}. The agents choose whether to invest in a 
new type. Their payoffs are the same than in the baseline model. We let 7 € A(O) 
denote the prior probability measure on agents’ types. An investment strategy for 
an agent is a stochastic mapping tT: © — A(T) associating any initial types 6 to a 
conditional distribution t(@) over final types. For any state 6 € ©, we define the 
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interim expected cost of agent 6 as: 


Ce y [et 8) r(dt | 0). 


The sender provides information to the receiver by committing to a statistical 
experiment (a, S) (Blackwell, 1951, 1953), which consists in an endogenously 
chosen set of signal realizations S and a stochastic mapping 7: T — A(S) 
associating any realized final type type ¢ to a probability distribution (tf) over S. 
We denote by o the collection (a (t));-r. We also assume that the sender has the 
same payoff than the receiver. 

Given an experiment (oc, S$), when he anticipates the investment strategy of any 
agent of type 6 to be t(@), designer’s posterior belief on an agents’ type whose 
signal realization is s, is given by be Bayes rule: 


_ [oslo (ar 18) x(a0) 
Ho (T y= a ——————, 
[ .7olo rar 8) x(a8) 


for any Borel set 7 C T and any s € U,er supp(a(t)). Let 


AG) = [ Pieces): 


be the associated the receiver’s posterior expectation over one agents’ final type 
conditional on signal s. 

An allocation strategy for the receiver is a stochastic mapping a: S — A(A) 
associating any signal realization s to a probability distribution a(s) over allocation 
decisions. With a slight abuse of notation, we let a: S — [0, 1] be the measurable 
function such that a(s) = a(1|s) for any signal realization s € S. For any 
anticipated strategy of the agent T, the receiver’s optimal strategy @ must maximize 
the expected posterior mean type conditional on allocation, given by 


‘ i, (s) a(s) (ds |t) r(dt | @) 2(d8). 
SxTxO 
Let us define the set: 


S(o,r)={s€S | fees) 2 0}s 
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for any (o,T). The receiver’s optimal strategy under any o and T is thus given by 


Qo,r(S) = Ls(o,7) (s), 


forall s € S. 
Given an experiment (co, S), the interim probability that the receiver allocates 
the object under anticipation T, and agent’s strategy is T’, is 


Port’, 8) = [ _ Wee(8) 0 (ds | 1) 1/(dt |). 


for every 6 € ©. Thus, the agents’ interim payoff when the designer anticipates Tt 
but the actual investment strategy is T’ is given by: 


Uselts, 0) = Po,t Coe 0) | Ce, 0). 


We thus say that t is agent-incentive-compatible if it is a best response to the 
receiver’s optimal strategy under experiment (co, S). That is: 


Veg (F; 0) 2 Use’; 0), (A-IC) 


for any 6 € © and any 7’. 
Let us define the sender’s equilibrium payoff under experiment (c, S) as follows: 


V(o,t) = [ peg tt (8) @erse(8) O48 |1) 4dr | 8) (48), 


The problem for designer is to find an experiment (co, S) that maximizes her ex-ante 
expected payoff given that t must be agent-incentive-compatible under (c, S), that 
1S: 

maximize V(o,T) subject to (A-IC). 


Akin to Kamenica and Gentzkow (2011), we prove a recommendation principle 
entailing that the choice of an optimal experiment (co, S) can be reduced to the 
choice of an allocation recommendation rule 7: T — A(A). The proof follows 
similar steps as in Perez-Richet and Skreta (2022b) and goes as follows: Start 
from an arbitrary experiment (0, S) under which the equilibrium is (a@,,7,T). 
Then, define the garbled experiment ¢ which pools together all signals leading 
designer to allocate under the original experiment o. The experiment induces 
the designer to follow the recommendations and also maintains the same interim 
allocation probabilities for the agents so no type has any incentive to deviate from 
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its investment strategy under ¢. Hence, garbling the original experiment leads 
to the same payoffs for the designer as well as agents. Although the result that 
following the recommendations under the new experiment is standard, the result 
that t remains a best response to ¢ is distinctive of our framework. We state the 
recommendation principle more formally in the next lemma together with its proof. 


Lemma 10 (Recommendation principle). Fix an experiment (a, S) and let t be 
(A-IC). Consider the experiment (¢, A) defined by 


s(1|t) = o(S(o,7) | t) 


for every t € T. Then, all the following properties are satisfied: 


(i) The receiver always follows the sender’s recommendations in equilibrium, 
i.€., @7(1) = 1; 


(ii) Interim allocation probabilities are the same under o and ¢, i.e., Pg.7(T’, 9) = 
Peo,r(t’, 0) for any t’ and 0; 


(iii) tT is (A-IC) under ¢; 


(iv) Equilibrium payoffs are the same under o and ¢, i.e., Ug 7(T, 0) = Uc (T, 8) 
for any 9 and V(a,T) = V(¢,T). 


Proof. See appendix A.5. Oo 


With a slight abuse of notation, we let 7: T — [0,1] be the measurable 
function such that 0 (t) = a0 (1 |t) for allt ¢ T. Henceforth, we refer to the function 
o as sender’s recommendation rule. Under that formulation, the ex-ante expected 
payoff for the sender is given by: 


Voor) =f tor) r(at| 8) m(d8). 


In turn, the interim payoff for the agent is given by: 


Uo(t.6) = | a(t) r(dt 8) ~ CC, 8). 


When choosing a recommendation rule, the designer must make sure that it is 
individually rational for the receiver to follow its recommendation but also that 
it is agent-incentive-compatible. The sender’s recommendation rule is receiver- 
individually-rational if the receiver’s expected payoff conditional on an allocation 


12 


recommendation is non-negative, i.e., J7.@ta(t) tT(dt| 6) a(d@) > 0, and her 
expected payoff following a non-allocation recommendation is non-positive, 1.e., 
JSrx@ t(1- o(t)) t(dt| 6) 1(d@) < 0. Combining these two inequalities, we obtain: 


Vea =e {0 [ tr(dt |) x(a6)| . (R-IR) 
TxO 
The sender’s problem thus consists in finding the recommendation rule solving: 
maximize V(o, T) subject to (R-IR) and (A-IC). 
OT 


We now argue that the constraint (R-IR) in sender’s problem can be removed without 
loss of generality. Indeed, a non-informative experiment achieves the lower bound 
required by (R-IR) and satisfies the (A-IC) constraint. A non-informative experiment 
induces a constant recommendation rule, thus {7,9 ¢t(dt| 0) (dé) = 6, since 
there is no way for agents to affect the allocation probability by investing. If 0, > 0 
then sender fixes o-(t) = 1 for every t and if 6, < 0 then she fixes a(t) = 0 for 
every t. Hence, designer’s payoff is max{0, 6,} which corresponds to the lower 
bound in (R-IR). But, remark that any recommendation rule solving the relaxed 
program 

maximize Vio, T) subject to (A-IC) (P) 


must give the designer at least the value than under an uninformative experiment 

and thus satisfy constraint (R-IR). Slightly abusing of notation, we henceforth 

denote by t(6@) the designer’s preferred selection of the correspondence 7 (6) = 

arg max o-(t) — yc(t, 0). Under that formulation, the optimization program of the 
teT 


designer can be stated as our original allocation problem: 


6 
maximize i 1(0) (r(0)) f(0) 40 


subject to 1(6) € argmax o(t) — yc(t, @) (A-IC) 


teT 


Interestingly, this equivalence implies that the designer has no additional value 
when she has more commitment power. 


Proposition 6. Commitment to a mechanism has no additional value than commit- 
ment to an experiment only to the designer. 


43 


1.6. CONCLUSION 


We study the optimal design of non-market allocation mechanisms that take into 
account agents’ productive investment incentives. In our baseline model, a principal 
has a unit mass of resources to allocate to a unit mass of agents. The agents are 
characterized by a type, which is the only payoff-relevant variable for the principal. 
Agents undertake costly investments, the outcome of which is a type transformation 
observable by the principal. The principal wishes to allocate the good to agents 
whose types are above some ideal threshold. She commits ex-ante to an selection 
rule, contingent on the outcome of the type improvement. Our main result states 
that pass-fail selection rules are optimal, under the assumption that the preference 
threshold of the principal lies in the right tail of the distribution of agents’ initial 
types and that the agents’ investment costs are increasing and convex in the amount 
of their investment. 

We also cover three extensions of the model. First, we consider a capacity 
constrained designer. Then we consider the problem faced by a utilitarian social 
planner. In both cases, we show that pass-fail rules remain optimal. The optimal 
cutoff increases when the capacity constraint is binding. Conversely, taking into 
account agents’ costs when maximizing social welfare leads the planner to choose 
a lower threshold than in the baseline solution. Finally, we show that the optimal 
allocation can be implemented by an information designer. This implies that 
weakening the principal’s commitment power does not reduce her optimal payoff. 

There are several interesting directions for future work. The first, which is 
the most natural but nevertheless challenging, is to extend the characterization of 
optimal mechanisms to more general distributions and cost functions. Second, 
characterizing selection rules that would combine productive investment incentives 
together with the possibility that agents might engage in falsification or, equivalently, 
in costly signaling, is also a promising avenue. Finally, adding affirmative action 
constraints to our problem, e.g., in the form of quotas, is also an important 
extension, whose resolution would allow for the design of fairer resource allocation 
mechanisms. 
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2. PERSUADING A WISHFUL 
THINKER 


Abstract 


We analyze a model of persuasion in which Receiver forms wishful 
non-Bayesian beliefs. The effectiveness of persuasion depends on Re- 
ceiver’s material stakes: it is more effective when intended to encourage 
risky behavior that potentially leads to a high payoff and less effective 
when intended to encourage more cautious behavior. We illustrate this 
insight with applications showing why informational interventions are 
often ineffective in inducing greater investment in preventive health 
treatments, how financial advisors might take advantage of their clients 
overoptimistic beliefs and why strategic information disclosure to voters 
with different partisan preferences can lead to belief polarization in an 
electorate. 


2.1. INTRODUCTION 


It is generally assumed in models of strategic communication that receivers update 
beliefs in a perfectly rational manner, as would a Bayesian statistician. Yet, a 
substantial literature in psychology and behavioral economics shows that the 
process by which individuals interpret information and form beliefs is not guided 
solely by a desire for accuracy but often depends on their motivations and material 
incentives. This phenomenon is generally referred to as motivated inference (Kunda, 
1987, 1990), and a common manifestation of it is wishful thinking: the tendency of 


'This chapter is a joint work with Daniel Barreto and previously circulated under the title 
“Wishful Thinking: Persuasion and Polarization.” This version of the paper is the one submitted 
to the journal Games & Economic Behavior on February 28, 2022, currently at the “Reject and 
Resubmit” stage. We would like to point out that the version presented in this thesis is being 
extensively modified in view of the resubmission of the paper. We thank Jeanne Hagenbach and 
Eduardo Perez-Richet for their support. We also thank S. Nageeb Ali, Roland Bénabou, Michele 
Fioretti, Alexis Ghersengorin, Simon Gleyze, Emeric Henry, Deniz Kattwinkel, Frédéric Koessler, 
Laurent Mathevet, Meg Meyer, Daniel Monte, Nikhil Vellodi, Adrien Vigier and Yves Le Yaouanq 
for their valuable feedbacks and comments, as well as seminar audiences at Sciences Po, Paris 
School of Economics, Sao Paulo School of Economics (FGV) and at the Econometric Society 
European Meeting 2021. All remaining errors are ours. This project has received funding from 
the European Research Council (ERC) under the European Union’s Horizon 2020 research and 
innovation programme (grant agreement 850996 — MOREV and 101001694 —- IMEDMC). 
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individuals to let their preferences about outcomes influence the way they process 
information, leading to beliefs that are systematically biased towards outcomes 
they wish to be true.” In this paper we investigate how wishful thinking affects the 
effectiveness of persuasion, i.e., the probability or frequency with which a sender 
is able to induce a receiver to take her preferred action. 

Following Caplin and Leahy (2019), we propose a model in which the receiver’s 
belief updating rule is non-bayesian: after observing an informative signal, Receiver 
forms beliefs by trading off their anticipatory value against the psychological cost 
of distorting beliefs away from Bayesian ones. As a result, Receiver’s beliefs are 
stakes-dependent, i.e., they depend on his preferences, and overweight the state 
associated with the highest payoff, giving rise to overoptimism. 

Distortions in beliefs lead to distortions in Receiver’s behavior: some actions 
end up being favored, meaning that they are taken more often (1.e., after the 
reception of a strictly greater set of possible signals) relative to a Bayesian decision- 
maker. When he only has two available actions, wishful thinking leads Receiver to 
favor the action associated with the highest payoff and the highest payoff variability. 
If one of the two actions induces the highest possible payoff and the other induces 
the highest payoff variability, then which of the two is favored depends on the 
magnitude of Receiver’s belief distortion cost. As such, the effectiveness of 
information provision as a tool to incentivize agents might vary with individuals’ 
material stakes: persuasion is more effective when it is aimed at encouraging 
behavior that is risky but can potentially yield very high returns and less effective 
when it is aimed at encouraging more cautious behavior. We illustrate this insight 
in applications in which wishful beliefs can play an important role. 


Application 1: Information Provision and Preventive Health Care. In this 
application a public health agency designs an information policy about the risk of 
infection of an illness in order to promote a preventive treatment that can be adopted 
by individuals at some cost. Since not adopting the treatment is the action that can 
potentially yield the highest payoff (in case the illness is not severe) and also the 
action with the highest payoff variability, it is favored by wishful receivers. As such, 
information campaigns aimed at promoting preventive behavior are less effective. 
We also show how the effectiveness of information campaigns are impacted by the 
severity of the disease and the effectiveness of the treatment. 


2There exists abundant experimental evidence of wishful thinking. See in particular Bénabou 
and Tirole (2016), page 150 and Benjamin (2019) Section 9, as well as, e.g., Weinstein (1980), 
Mijovié-Prelec and Prelec (2010), Mayraz (2011), Heger and Papageorge (2018), Coutts (2019), 
Engelmann, Lebreton, Schwardmann, van der Weele, and Chang (2019) or Jiao (2020). 
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This application sheds light on the stylized fact that individuals are consistently 
investing too little in preventive health care treatments, even if offered at low 
prices (especially in developing countries, see Dupas, 2011; Chandra, Handel, and 
Schwartzstein, 2019; Kremer, Rao, and Schilbach, 2019, Section 3.1) and that 
informational interventions are often ineffective in inducing more investment in 
preventive health care devices (see, in particular, Dupas, 2011, Section 4, and 
Kremer et al., 2019, Section 3.3). Recent literature conjectures that individuals 
might not be responsive to such information campaigns because they prefer to hold 
optimistic prospects about their health risks (see Schwardmann, 2019 and Kremer 
et al., 2019, Section 3.3).3 Our model formalizes this argument. 


Application 2: Persuading a Wishful Investor. In this application, we consider 
the interaction between a financial broker and her potential client. The broker 
designs reports about the (continuously distributed) return of some risky financial 
product to persuade the client to buy the asset. We show that a financial broker 
interested in selling a risky product is always more effective when persuading a 
wishful investor. 

This application formalizes why some professional financial advisors might 
sometimes not act in the best interest of their clients by making investment 
recommendations that take advantage of their biases and mistaken beliefs (see, 
for instance, Mullainathan, Noeth, and Schoar, 2012 or Beshears, Choi, Laibson, 
and Madrian, 2018, Section 9) as well as why some consulting firms seem to 
specialize in advice misconduct and cater to biased consumers (Egan, Matvos, and 
Seru, 2019). It also helps explaining why the online betting industry puts so much 
effort into persuasion. Indeed, Babad and Katz (1991) document that individuals 
generally display wishful thinking when they take part in lotteries: they prefer to 
think they will win and are therefore more receptive to information encouraging 
risky bets. 


Application 3: Public Persuasion and Political Polarization. Belief polariza- 
tion along partisan lines is a pervasive and much debated feature of contemporary 
societies. Although such polarization can be partly caused by differential access to 
information, evidence suggests that it is exacerbated by the fact that individuals 
tend to make motivated inferences about the same piece of information (Babad, 
1995; Thaler, 2020). 


3There exists compelling experimental evidence that such self-deception exists in the medical 
testing context (Lerman, Hughes, Lemon, Main, Snyder, Durham, Narod, and Lynch, 1998; Oster, 
Shoulson, and Dorsey, 2013; Ganguly and Tasoff, 2017). 
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In this application we explore the relationship between optimal information 
disclosure to wishful citizens and belief polarization. Following Alonso and Camara 
(2016), we model a majority voting setting in which an electorate, differentiated in 
terms of partisan preferences, uses information disclosed by a politician to vote 
on a proposal. Wishful thinking leads voters with different preferences to adopt 
different beliefs after being exposed to a public signal: those voting against or for 
the proposal distort their beliefs in opposite directions, giving rise to polarization. 
Sender’s optimal public experiment consists in persuading the median voter, which 
maximizes the number of voters distorting beliefs in opposite directions. We 
show that if partisan preferences are symmetrically distributed around the median, 
then Sender’s optimal information policy generates maximal belief polarization in 
the electorate as a byproduct. This adds nuance to the argument that motivated 
thinking is one of the drivers of polarization: not only can motivated thinking lead 
to polarization, but the strategic disclosure of information to a motivated electorate 
can also accentuate this tendency*. 


Related literature. The persuasion and information design literatures has initially 
focused on the problem of influencing rational Bayesian decision-makers as in 
the seminal contributions of Kamenica and Gentzkow (2011) and Bergemann and 
Morris (2016). By introducing non-Bayesian updating in the form of motivated 
beliefs formation, we contribute to the literature studying persuasion of receivers 
subject to mistakes in probabilistic inferences.°? Levy, Moreno de Barreda, 
and Razin (2018) analyze a Bayesian persuasion problem where a sender can 
send multiple signals to a receiver subject to correlation neglect. Benjamin, 
Bodoh-Creed, and Rabin (2019) provide an example of persuasion game where 


4This application is related to the paper by Le Yaouanq (2021) who constructs a model of large 
elections with motivated voters. As in our model, the formation of motivated beliefs by citizens 
leads voters with different preferences to hold different beliefs after observing the same information. 
We find, as he does, that greater heterogeneity in partisan preferences increases belief polarization 
but has no effect on the policy implemented in equilibrium. This is, however, the consequence of a 
different modelling assumption. Namely, that information is endogenously designed to persuade 
the median voter, whose vote is not distorted relative to a Bayesian voter. 

5See Bergemann and Morris (2019) and Kamenica (2019) for reviews of this literature. 

6See Benjamin (2019) for a review of the literature. In particular, wishful thinking belongs to 
preference-biased inferences reviewed in Benjamin (2019), Section 9. 

7It is interesting to note that an active literature also explores how errors in strategic reasoning 
(Eyster, 2019) affect equilibrium outcomes in strategic communication games. Although in our 
model Receiver understands all the strategic issues, we believe, nevertheless, that it is important 
to mention that players’ misunderstanding of their strategic environment might also lead them to 
make errors in statistical inference even if they update beliefs via Bayes’ rule, as in Mullainathan, 
Schwartzstein, and Shleifer (2008), Ettinger and Jehiel (2010), Hagenbach and Koessler (2020) and 
Eliaz, Spiegler, and Thysen (2021b,a) who consider communication games where players make 
inferential errors because of a coarse understanding of their environment. 
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Receiver exhibits base-rate neglect when updating beliefs. In de Clippel and Zhang 
(2020) the receiver holds subjective beliefs which belong to a broader class of 
distorted Bayesian posteriors. In contrast, in our model, Receiver’s belief formation 
process optimally trades-off the benefits and costs associated with maintaining 
non-Bayesian beliefs as in the work of Caplin and Leahy (2019). 

On the one hand, we assume that Receiver’s value from maintaining inaccurate 
beliefs comes from the anticipation of the payoff he will achieve in equilibrium. 
Intuitively, it represents the idea that individuals might derive utility from the 
anticipation of future outcomes, be them good or bad. This hypothesis has been 
widely used in the literature to study how anticipatory emotions affect physical 
choices (see, e.g., Loewenstein, 1987; Caplin and Leahy, 2001) as well as choices 
of beliefs (Bénabou and Tirole, 2002; Brunnermeier and Parker, 2005; Bracha and 
Brown, 2012; Caplin and Leahy, 2019). Receiver’s choice of beliefs is thus a way 
of satisfying his psychological need to be optimistic about the best-case outcomes 
or, on the contrary, to avoid the dread and anxiety associated with the worst-case 
outcomes. This hypothesis is supported experimentally by Engelmann et al. (2019), 
who find significant evidence that wishful thinking is caused by the desire to reduce 
anxiety associated with anticipating bad events. It is important to note that while 
anticipatory utility may be a strong motive for manipulating one’s beliefs, it is not 
the only possible one. This differentiates wishful thinking from the more general 
concept of motivated reasoning, which is usually defined as the degree to which 
individuals’ cognition is affected by their motivations.® Different motivations 
from anticipated payoffs have been explored in the literature such as cognitive 
dissonance avoidance (Akerlof and Dickens, 1982; Golman, Loewenstein, Moene, 
and Zarri, 2016), preference to believe in a “Just World” (Bénabou and Tirole, 
2006), maintaining high motivation when individuals are aware of being subject to 
a form of time-inconsistency (Bénabou and Tirole, 2002, 2004) or satisfying the 
need to belong to a particular identity (Bénabou and Tirole, 2011). 

On the other hand, we assume distorting beliefs away from the Bayesian 
benchmark is subject to some psychological cost. This assumption reflects the 
idea that, under a motivated cognition process (Kunda, 1987, 1990), individuals 
may use sophisticated mental strategies such as manipulating their own memory 
(Bénabou, 2015; Bénabou and Tirole, 2016)°, avoiding freely available information 
(Golman, Hagmann, and Loewenstein, 2017) or creating elaborate narratives 


8See Krizan and Windschitl (2009) for a more detailed discussion on the differences between 
wishful thinking and motivated reasoning. 

°For experimental evidence on memory manipulation see, e.g., Saucet and Villeval (2019), 
Carlson, Maréchal, Oud, Fehr, and Crockett (2020) and Chew, Huang, and Zhao (2020). 
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supporting their bad choices or inaccurate claims to justify their preferred beliefs. 
Our assumptions on the cost function captures, in “reduced form’, the fact that 
implementing such mental strategies comes at a cost when desired beliefs deviate 
from from the Bayesian rational ones. In contrast, Brunnermeier and Parker 
(2005) model the cost of erroneous beliefs as the instrumental loss associated 
with the inaccurate choices induced by such beliefs. It is worth noting that Coutts 
(2019) provides experimental evidence in favor of the psychological rather than 
instrumental costs associated with belief distortion. 


2.2. MOoDEL 


States and prior belief. A state of the world 6 is drawn by Nature from a state 
space © according to a prior distribution uo € int(A(@))." Receiver (he) and 
Sender (she) do not observe the state ex-ante but its prior distribution is common 
knowledge. 


Actions and payoffs. Receiver chooses an action a from a compact space A with 
at least two actions. His material payoff is given by u(a, @).!2 Receiver’s choice 
affects Sender’s payoff, which is given by v(a). Before Receiver takes his action, 
Sender can commit to any signal structure (a, S) given by an endogenously chosen 
set of signal realizations $ and a stochastic mapping 0: © — A(S) associating 
any realized state 6 to a conditional distribution 0 (@) over S. 


Receiver’s behavior. For any belief 7 € A(@), Receiver’s optimal action corre- 
spondance is given by 


A(n) = argmax | u(a, 0) n(dé). 
acA 1) 


Without loss of generality, we assume that no action is dominated, 1.e., for any 
action a € A there always exists some belief 7 such that a € A(7). When the set 


10One can relate this possible microfoundation of the belief distortion cost to the literature on 
lying costs (Abeler, Becker, and Falk, 2014; Abeler, Nosenzo, and Raymond, 2019) since, when 
Receiver is distorting away his subjective belief from the rational Bayesian beliefs, he is essentially 
lying to himself. We thank Emeric Henry for suggesting us this interpretation of the cost function. 

“In what follows, for any nonempty Polish space X, we denote A(X) the set of Borel probability 
measures over the measure space (X, B(X)). We always endow A(X) with the weak*-topology. If 
the support of a measure yu € A(X) is finite we adopt the shorthand notation u({x}) = u(x) for any 
x € supp(}). 

12We assume the map u(a,-): © — R to be Borel measurable, continuous and bounded for any 
aca. 
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A(7) has more than one element we break the tie in favor of Sender. That is, for 
any belief 7, the action played by Receiver in equilibrium is given by a selection 
a(n) € A(7) which maximizes Sender’s expected payoff. % 


Receiver’s beliefs. After observing any signal realization s € S, a Bayesian 
decision-maker’s belief is given by 


[ o-(s | 6) j10(d0) 


iS) 


u(O|s) = ——_, 
[76516 no( <0) 

) 
for any Borel set © C ©. 

In contrast, we assume that, when forming beliefs, Receiver trades-off the 
psychological benefit against the psychological cost of holding possibly non- 
Bayesian beliefs. The psychological benefit of Receiver under a certain belief 77 is 
given by his anticipated material payoff 


U(n) = [ u(a(n), 6) n(d6). 


However, holding belief 7 when the Bayesian belief generated by some signal is yu 
comes at a psychological cost C(7, «) for Receiver. We assume that this cost is 
given by the Kullback-Leibler divergence between 77 and yu, formally defined by 


_ [ dn dn 
conn) = [Sein ( 2] u(d6), 


for any 7, u € A(@), where d7/du is the Radon-Nikodym derivative of 7 with 
respect to y4, defined whenever 77 is absolutely continuous with respect to uw. This 
assumption is made for tractability but does not qualitatively affect our main 
results.* Accordingly, we define Receiver’s psychological payoff as 


W(n, w) = U(n) - 5 C(nB) 


for any 7, u € A(@), where p € R} parametrizes the extent of Receiver’s wish- 


There might be more than one such selection if there exists some 7 € A(@) at which Sender is 
indifferent between some actions in A(7). In that case, we pick arbitrarily one of those. 

44We show that our results on Receiver’s equilibrium beliefs and behavior continue to hold 
when the psychological cost functions belongs to a more general class of statistical divergences in 
appendix B.1. 
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fulness. Receiver’s belief 7 must maximize his psychological payoff given any 
Bayesian belief 4. Therefore, it must belong to the optimal beliefs correspondence 


B(u) = arg max ‘¥(77, 1), 
neEA(O) 
for any uw € A(Q@), and Receiver’s psychological payoff when he holds a belief 
7 € B(y) is 
Y(u) = max VP(7, 1), 
neA(O) 


for any Bayesian posterior uw € A(@).' We assume that when Receiver is 
psychologically indifferent between several beliefs in B() he picks the one that 
maximizes Sender’s expected utility. Therefore, Receiver’s equilibrium belief is 
given by a selection 7() € B(y) which maximizes Sender’s expected payoff. 
This tie breaking rule ensures that the Receiver’s equilibrium belief is uniquely 
defined and simplifies the characterization of the optimal information policy. 


Persuasion problem. We can equivalently think of Sender committing ex-ante 
to a signal structure (a, S) or to an information policy t € T (lo), where 


T (uo) = {2 E A(A(O)): he u(@) t(du) = o() for any Borel set © C o} : 


is the set of Bayes-plausible distributions over posterior beliefs given the prior ju. 

We assume Sender knows Receiver is a wishful thinker. Accordingly, she 
correctly anticipates the belief Receiver holds in equilibrium. Since Receiver’s 
equilibrium belief characterizes how he would distort his belief away from any 
realized Bayesian posterior, Sender can choose the best information policy by 
backward induction, knowing: (i) which belief 7 (2) Receiver holds in equilibrium 
after a posterior fz € supp(T) is realized and (ii) which action a(7(z)) Receiver 


As already noted by Bracha and Brown (2012) as well as Caplin and Leahy (2019), this 
optimization problem has a similar mathematical structure to the multiplier preferences developed in 
Hansen and Sargent (2008) and axiomatized in Strzalecki (2011). Precisely, the agent in Strzalecki 
(2011) solves 


1 
i , 9) n(d@) + —C(y, pd), 2.1 
max-min | w(a ) (dé) a (7, H) Q1) 


for any given uw € A(®). In that model, the parameter p measures the degree of confidence 
of the decision-maker in the belief yz or, in other words, the importance he attaches to belief 
misspecification. Conclusions on the belief distortion in that setting are naturally reversed with 
respect to our model: a receiver forming beliefs according to equation (2.1) would form overcautious 
beliefs. Studying how a rational Sender would persuade a Receiver concerned by robustness seems 
an interesting path for future research. 

16Again, if Sender is indifferent between some beliefs we pick arbitrarily one of those. 
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chooses in equilibrium given the distorted belief 7(z). Sender’s indirect payoff 
function is therefore given by 


v(u) = v (a(n(H))) 


for any 4 € A(®) and, hence, Sender’s value from persuading a wishful Receiver 
under the prior [Uo 1S 


V(t) = max / v(u) t(du). (2.2) 
TET (uo) JA(O) 


2.3. RECEIVER’S WISHFUL BELIEFS AND BEHAVIOR 


In this section, we first extend Caplin and Leahy (2019) results by characterizing 
Receiver’s equilibrium beliefs and behavior without imposing any restrictions on 
the action or state space. 

To begin with, let Receiver’s anticipated material payoff under action a and 
belief 7 be defined by 


Ua(n) = ie u(a, 6) (d8). 
Moreover, let 


1 
Na() = arg max Ug(7) — —C(n, 1), 
nce A(@) p 


be Receiver’s belief motivated by action a under posterior 4 and 
Ya(H) Uan) ~ =C(.H) 
= max —- -—C(7, L), 
a\l neA(8) a\7] 2 1, HM 


be Receiver’s maximal psychological payoff motivated by action a under posterior 
pt. We identify Receiver’s equilibrium belief 7(z) by: (i) finding the belief 
motivated by action a under y, resulting in psychological payoff ‘P, (4), for any a 
and w; (ii) finding which action it is optimal to motivate by maximizing VP, (1) 
with respect to a. proposition 7 characterizes 7,() and ‘Y,(,) in closed-form. 


Proposition 7. Receiver’s maximal psychological payoff motivated by action a 
under the Bayesian posterior ul is given by 


ein | [exp (puta.o)) 4a0)), 2.3) 
p \Jo 
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and is attained uniquely at the belief 


[exp (pu(a,6)) u(a6) 
na(ti)(@) = 22 —________. 2.4) 
[ex (eua.6)) (a8) 


for any Borel set © € ®. 
Proof. See appendix B.1. Oo 


Remark now that if the action a uniquely maximizes Receiver’s psychological 
payoff under Bayesian posterior uw we have n(u) = n,(). If, on the other hand, 
Ya(u) = Pa (wu) at uw for some a’ # a, meaning that Receiver is psychologically 
indifferent between two beliefs, then Sender breaks the tie. As a consequence, if 
pt € A(@) satisfies 

Yalu) > Pa (uw), (2.5) 


for all a’ # a, meaning that Receiver psychologically prefers action a to any other 
action a’, then Receiver’s equilibrium belief is given by 


n(w)(®) = na(u) (6), 


for any Borel set © C ©. If uw € A() satisfies 


Pa(u) = Par (u), 


for some a’ # a, meaning that Receiver is psychologically indifferent between 
some actions a’ and a, then Sender picks her preferred belief given by 


n()(©) = nar(u)(O), 


where a* € arg MaXge{a,q’} V(G). 

First, we can see from equation (2.4) that Receiver only distorts beliefs that 
induce actions with state-dependant payoffs, i.e., Receiver’s beliefs are stakes- 
dependent. Formally, for any a € A, we have ng(u) # wif, and only if, there exists 
0 # 6’ such that u(a, 6) # u(a, 0’). Second, Receiver forms beliefs that overweight 
the states associated with the highest payoff, giving rise to overoptimism. Formally, 
we always have 7q({4)(Qq) = u(@q) for any a € A where @, = arg maxge@ u(a, 8). 
Moreover, Receiver’s belief about payoff maximizing states ng(u)(Qq) grows 
monotonically and eventually converges to 1 as Receiver’s wishfulness p grows 
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from 0 to +00,” 

As proposition 7 shows, wishful thinking leads Receiver to hold overoptimistic 
beliefs. The next result shows that wishful thinking distorts Receiver’s behavior 
accordingly. 


Corollary 3. Under his equilibrium belief, Receiver’s optimal action correspon- 
dence is given by 


A(n()) = arg max [ exp (pu(a,6)) (d0), 
acA © 


for any pt € A(®) so Receiver’s equilibrium action a(n(1)) corresponds to Sender’s 
preferred selection in A(n(1)). 


Remark that this result comes as a direct consequence of proposition 7 as, by 
definition, any action a is optimal under the belief motivated by action a. As already 
observed by Caplin and Leahy (2019), the previous result states, in essence, that a 
Receiver forming wishful beliefs behaves as a Bayesian agent whose preferences 


are distorted by the function z +» exp(pz) for any z € R. Importantly, from 


Sender’s point of view, a wishful Receiver’s behavior is indistinguishable from 
that of a Bayesian rational agent with payoff function exp(pu(a, @)). Accordingly, 
since the function z +» exp(pz) is strictly convex as soon as p > O, an agent 
forming wishful beliefs is less risk averse than his Bayesian self. 

Corollary 3 also shows that wishful thinking materializes in the form of “moti- 
vated errors” in the sense of Exley and Kessler (2019): by choosing psychologically 
desirable beliefs, Receiver commits systematic errors in his decision-making, 1.e., 
acts as if he had cognitive limitations or behavioral biases relatively to a Bayesian 
decision-maker. 


2.4. SENDER’S VALUE FROM PERSUASION 


In this section, we assume that the action space of Receiver is binary, so A = {0, 1}, 
and that Sender wants to induce a = 1, so v(a) = a. We provide necessary and 
sufficient conditions on Receiver’s preferences under which he would take action 1 
under a greater set of beliefs than a Bayesian Receiver. This allows us to compare 
Sender’s value from persuading a wishful rather than a Bayesian Receiver as a 
function of the model’s primitives, that is: Receiver’s preferences and wishfulness. 


This property comes from the fact that wishful beliefs take the form of a soft-max function. 
For the sake of completeness we provide a proof of this result in appendix B.2. 
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The restriction to a binary set of actions is with loss of generality but allows better 
tractability. 
We start by defining the two following sets of beliefs: 


AB = {we A): ae A(p)}, 


and 
AW = {uw € A(Q) :a€ A(n(u))}, 


for any a € A. The set A? (resp. A) is the subset of posterior beliefs supporting 
an action a as optimal for a Bayesian (resp. wishful) Receiver. We say that an 
action is favored by a wishful receiver if that action is supported as optimal on a 
strictly larger set of posterior beliefs by a wishful Receiver compared to a Bayesian. 


Definition 6 (Favored action). An action a € A is favored by a wishful Receiver if 
AP CAM, 


Assume for now on that © = {6, 6}. We first characterize when a wishful 
Receiver favors action a = | when the state space is binary and show afterwards 
that our results extend to any finite state space. Let us denote u(a,@) = u, and 
u(a,@) = Ua for any (a,@) € AX ©. Assume that Receiver wants to “match the 
state,” such that 4),U,) > Uo,u,. Define the payoff variability under action 0 by 
Ug = Uy — Uy, the payoff variability under action 0 by u; = uy —u, and the indicator 
of the highest achievable payoff by Umax = Ug — 41. With a small abuse of notation, 
denote 7 = 7(@) and p = (8). 

By corollary 3, comparing how a wishful Receiver behaves compared to a 
Bayesian one is equivalent to comparing the behavior of two Bayesian receivers 
with respective payoff functions exp(pu(a,@)) and u(a,@). Thus, denote yw? 
(resp. yu (p)) the belief at which a Receiver with preferences u(a,@) (resp. 
exp(pu(a, @))) is indifferent between the two actions. Those beliefs are respectively 
equal to nee 

we aa, Fh = 
407 44 1 0 
and 
wW exp(puy) — exp(pu,) 
L- (e) = — 
exp(puly) — exp(pu,) + exp(pi1) — exp(puo) 
With only two states, a wishful Receiver favors action a = 1 if and only if up” < yp, 
since whenever that condition is satisfied a wishful Receiver takes action a = 1 
under a larger set of beliefs than a Bayesian. Next proposition characterizes when 


this is the case. 
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Lemma 11. Action a = 1 is favored by a wishful Receiver if, and only if: 
(i) Umax < 0 and ug < U4, OF; 

(ii) Umax < 0, Up > uy and p > /, or; 

(iii) Umax > 0, Up < uy and p < p. 


where p is a strictly positive threshold such that 
pai 
Proof. See appendix B.3. Oo 


Two key aspects of Receiver’s material payoff thus determine which action 
he favors: the highest achievable payoff as well as the payoff variability for both 
actions. It is easy to grasp the importance of the highest payoff. Since the wishful 
thinker always distorts his beliefs in the direction of the most favorable outcome, 
in the limit, when there is no cost of distorting the Bayesian belief, Receiver would 
fully delude himself and always play the action that potentially yields such a payoff. 
The payoff variability uz, on the other hand, is precisely Receiver’s marginal 
psychological benefit from distorting his belief under action a. Hence, the higher 
the payoff variability associated with action a, the more the uncertainty about @ is 
relevant when such action is played and the bigger the marginal gain in anticipatory 
payoff the wishful thinker would get from distorting beliefs. 

lemma 11 states that if an action a has both the highest payoff u, or ; and 
the greatest payoff variability uz among all actions a € A, it is always favored. If 
an action has either the highest payoff or the greatest payoff variability, then the 
wishfulness parameter p defines whether or not it is favored: for high wishfulness 
the action with the highest payoff is favored, whereas for low wishfulness it is 
the action with the greatest payoff variability that is favored. The intuition is the 
following: for sufficiently high values of Receiver’s wishfulness, Receiver can 
afford stronger overoptimism about the most desired outcome, thus favoring the 
action that potentially yields this outcome despite such action not being associated 
with the highest marginal psychological benefit. In contrast, for sufficiently low 
values of p, Receiver cannot afford too much overoptimism about the most desired 
outcome. Hence, he prefers to distort beliefs at the margin that yields the highest 
marginal psychological benefit, such that the action associated with the highest 
payoff variability is favored. 

The next proposition extends lemma 11 to an arbitrary finite number of states. 


93 


Proposition 8. Assume ® is a finite set with more than two elements. Receiver 
favors action a = | if, and only if, for any pair of states 0,6’ € ©, Receiver’s 
material payoffs associated with those states and his wishfulness parameter p 
satisfy one of the conditions (i), (ii) or (iii) in lemma 11. 


Proof. See appendix B.4. Oo 


Proposition 8 can easily be visualized graphically in an example with three 
states. Assume © = {0, 1,2} and denote ie gr (resp. jaa) the belief making a 
Bayesian (resp. wishful) Receiver indifferent between actions a = 0 anda = 1 
when yu(@), u(6’) > O but u(6”) = O for any 6,6’,0” € ©. In figure 2.1 we 
illustrate how A compares to A when Receiver’s payoff function is given by: 


u(a,@)|@=0|A0=1/0=2 
a=0 2 3 -1 
a= 1 1 0 4 


921° ap : 6=2 
Hi> Mio 


Figure 2.1: Comparison of supporting sets of beliefs. In blue, the set of Bayesian 
posteriors supporting action a = | for a Bayesian Receiver. In red, the set of 
Bayesian posteriors supporting action a = | for a wishful Receiver. 


Notice that for the two pairs of states (0, 2) and (1, 2), the associated payoffs satisfy 
property (i) in lemma 11. That is, action a = | is associated with the highest payoff 
u(1,2) = 4 as well as the highest payoff variability u(1,2) — u(0,2) = 5, under 
both pair of states. As a consequence, lemma 11 applies whenever focusing on 
those two pairs of states letting the other one being assigned probability zero. Then, 
we have ui), > Mp. and yu)", > u?,. Remark now, that A? = co({ug 5 M7, 52}) 
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and AY = co({uy >, Eee 62}), where 6g denotes the Dirac distribution on state 
@ € ©. Consequently, Af es Ay so action a = | is favored by Receiver. If one 
of the conditions highlighted in lemma 11 were not satisfied for at least one of 
the pairs of states (0,2) or (1, 2) then one of the thresholds Ne would be less or 
equal than ft y in which case AY would not be a superset of AP anymore. 

Let us now turn our attention to the following questions: when is Sender 
better-off facing a wishful Receiver compared to a Bayesian and how does the 
(Blackwell) informativeness of Sender’s optimal policy compare when persuading 
a wishful or a Bayesian Receiver? Remember that Sender chooses an information 
policy t € A(A(@)) maximizing 


| v(u) t(dy), 
A(@) 


where 


1 if pe ag 
v(u) = ae 
QO otherwise 


subject to the Bayes plausibility constraint 
i ut (du) = bo. 
A(®) 


In the binary state case, it means that the threshold belief «” corresponds to the 
smallest Bayesian posterior Sender needs to induce to persuade a wishful Receiver 
to take action a = |. Therefore, lemma 11 and proposition 8 have immediate 
consequences for Sender. 


Corollary 4. Let © be an arbitrary finite space with at least two elements. Then, 
Sender always achieves a weakly higher payoff when interacting with a wishful 
Receiver compared to a Bayesian for any prior Uo € \0, 1[ if, and only if, for any 
pair of states 0,6’ € ©, Receiver’s material payoffs associated with those states 
and his wishfulness parameter p satisfy one of the conditions (i), (ii) or (iii) in 
lemma 11. Moreover, when the state space is binary, Sender’s optimal information 
policy is always weakly less (Blackwell) informative than in the Bayesian case. 


To illustrate corollary 4 we represent in figure 2.2 the concavifications of Sender’s 
indirect utility when Receiver is wishful or Bayesian in two different cases. The case 
corresponding to lemma 11 is represented in figure 2.2a. Sender is always better-off 
persuading a wishful compared to a Bayesian receiver as V(4u9) > V*% (9) for 
any 4g € ]0, 1[. On the other hand, if Receiver’s preferences or wishfulness do not 
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(a) At least one property in lemma 11 is_ (b) No property in lemma 11 is satisfied. 
satisfied. 


Figure 2.2: Expected payoffs under optimal information policies. Red curves: 
expected payoffs under wishful thinking. Blue curves: expected payoffs when 
Receiver is Bayesian. Dashed-dotted green lines: expected payoffs under a fully 
revealing experiment. 


satisfy any of the properties in lemma 11, then Sender is weakly worse-off under 
any prior. This case is represented on figure 2.2b. 
When Sender wants to induce an action that is (resp. is not) favored by a 

wishful Receiver, persuasion is always “easier” (resp. “harder’’) for Sender in 

the following sense: Sender needs a strictly less (resp. strictly more) Blackwell 

informative policy than KG to persuade Receiver to take his preferred action. 
Equivalently, if experiments were costly to produce, as in Gentzkow and Kamenica 
(2014), then Sender would always need to consume less (resp. more) resources 

to persuade a wishful Receiver to take his preferred action than a Bayesian. 
The hypothesis of a binary state space facilitates the comparisons between the 
Bayesian-optimal and the wishful-optimal information policies as it ensures that 
the Bayesian-optimal and the wishful-optimal information policies are Blackwell 
comparable. Although the informativeness comparisons in corollary 4 do not 
necessarily extend when the state space contains more than two elements, Sender’s 
welfare comparisons, in contrast, still hold under any arbitrary finite state space. 
We compare in figure 2.3 Sender’s optimal information policies when Receiver 
is Bayesian and wishful, with the same payoff function as in figure 2.1. When 
the state space is finite, a policy tT € 7T(yg) such that all elements in supp(t) 
are affinely independent is (weakly) more Blackwell-informative than a policy 
t’ € T(uo) if, and only if, and supp(t’) C co(supp(t)) (see Lipnowski, Mathevet, 
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62 


Figure 2.3: The Bayesian-optimal policy rt? (in blue) vs. the wishful-optimal 
policy t™ (in red) with respective supports {w2, ue 5} and { re hs AS 


and Wei, 2020, Lemma 2). The support of the Bayesian-optimal policy 17? 


co(supp(t™)) = {u € A(O) : St € [0,1],uw=tu"” + (1 - a ees It is visible on 
figure 2.3 that {u, oo! ¢ co(supp(t)). Hence, t? and r™ are not Blackwell 


(resp. wishful-optimal policy tT”) is (eitoo) (resp. Ae He ot): Hence, 


comparable. However, since Sender is interested in inducing action a = | and 
Receiver’s favors that action, Sender’s expected payoff is higher for any prior when 
Receiver is wishful. 


2.5. APPLICATIONS 


In this section, we expose in three applications that corollary 4 might have important 
economic consequences. 


2.5.1. Information provision and preventive health care 


A public health agency (Sender) informs an individual (Receiver) about the 
prevalence of a certain disease. Receiver forms beliefs about the infection risk, 
which can be either high or low: 0 < @ < 6 < 1. The probability of contracting 
that illness also depends on whether the individual adopts a preventive treatment 
or not, where a = | designates adoption. Investment in the treatment entails a cost 
c > Oto Receiver.'® Moreover, let us assume that the effectiveness of the treatment, 


One might interpret that cost to be the price of the treatment or the either material or 
psychological cost from undertaking medical procedures. 
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i.e., the probability that the treatment works, is a € [0, 1] so that the probability of 
falling ill, conditional on adoption, is (1 — @)@. The payoff from staying healthy is 
normalized to 0 whereas the payoff from being infected equals —¢ < 0 where ¢ is 
the severity of the disease. Receiver’s payoff function is 


u(a,@) = (1 -—a)(-s@) + a(-(1 — @)@¢ - c) 


for any (a,@) € Ax ®. We assume that ¢a@ < c < ¢a@ so Receiver faces a 
trade-off: he would prefer not to invest if he was sure the probability of infection 
was low and, conversely, would prefer to invest in the treatment if he was sure the 
risk of infection is high. Also remark that Receiver always expects to experience a 
negative payoff, as u(a, 0) < 0 for any (a,@) EA x @. 

The public health agency wants to maximize the probability of individuals 
adopting the preventive treatment.!? The agency informs individuals about the 
prevalence of the disease by designing and committing to a Bayes-plausible 
information policy t. A Bayesian Receiver would be indifferent between adopting 
or not the treatment at belief 


ee ads 
a8 - Oe 
In contrast, by proposition 7 and corollary 3, the equilibrium beliefs and behavior 
of a wishful Receiver are given by 


ee 
w+ (1- pW) exp(p¢(6 - 8) 


n(M) = _ 
pexp(—p(1 — a)¢(6- @)) 


wexp(—p(1 — a)¢(@- @)) + (1 - p) 


if u<p 


if w>p 


and 
a(n(u)) =1{u> pw} 


Maximizing the probability of adoption is a sensible objective since most infections cause 
negative externalities due to their transmission through social interactions. Therefore, a benevolent 
planner who wants to reduce the likelihood of transmission of an infection would do well to maximize 
the rate of adoption of the preventive treatment (for example, maximize condom distribution to 
control AIDS transmission, maximize injection of vaccines to control viral infections, or maximize 
mask use to control the spread of airborne diseases). 
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for any posterior belief uw € [0, 1], where 


exp(—p¢9) — exp(p(-(1 - @) 5 - c)) + exp(p(-(1 - a) 4g — c)) — exp(—p9s) 


We illustrate the belief distortion of Receiver in figure 2.4a. Receiver is always 


7] mud 
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m(u™) 
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(a) equilibrium belief 7(2) as a function of (b) Behavioral threshold x“ as a function of 
LL. ror 


Figure 2.4: The belief correspondence for ¢ = 2,c = 0.5,a@ = 0.8,9 = 0.1, 4 =0.9 
and p = 2. Receiver is always overoptimistic concerning his health risk for any 
induced posterior, except at 44 = 0 or pz = 1. Moreover, the belief threshold ™ as 
a function of p is strictly increasing and admits yz? as a lower bound. 


overoptimistic about his probability of staying healthy, as 7(u) < mw for any 
ue € [0,1]. Remark that non-adoption is associated with the highest possible 
payoff —¢@ as well as the highest payoff variability ¢(@ — @). Accordingly, by 
lemma 11, Receiver always favors non adoption as illustrates figure 2.4b. As a 
result of corollary 4, Sender always needs to induce higher beliefs for Receiver 
to adopt the treatment than she would need if she faced a Bayesian agent, all the 
more so when Receiver’s wishfulness p becomes larger. Therefore in this example, 
overoptimism of Receiver always goes against Sender’s interest. 

It is interesting to see how Sender’s probability of inducing the adoption of 
the treatment evolves with respect to the severity of the disease ¢, as well as the 
effectiveness of the treatment a.2° We represent on figure 2.5b the probability that 
Sender induces adoption of the treatment under the optimal information policy as 
a function of ¢. Notice that the probability of inducing adoption is less sensitive to 
the severity of the disease, i.e., becomes “‘flatter,’ when facing a wishful Receiver 


20This probability is pinned down by the Bayes-plausibility constraint and equal to T*% = puo/p? 
in the Bayesian case and tT = o/™ in the wishful case. 
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(a) Behavioral thresholds a (in blue) and (b) Probability t of inducing treatment adop- 
u™ (in red) as functions of severity ¢. tion as a function of severity ¢. 


Figure 2.5: Red (resp. blue) curves correspond to wishful (resp. Bayesian) 


Receiver. We set parameters to c = 0.5, a = 0.8, 8 = 0.1, 6 = 0.9 and p = 2. Full 
lines correspond to the case where a = | whereas dashed curves correspond to 
a=0.8. 


compared to the Bayesian when the treatment becomes less effective. The intuition 
is the following: when the treatment is fully effective, i.e., a = 1, Receiver’s 
payoff in case he invests in the treatment becomes state independent. Therefore, 
he does not have any incentive to distort beliefs when taking action a = 1. Asa 
result, .“ decreases and Receiver holds perfectly Bayesian beliefs when pp > py. 
However, whenever there is uncertainty about the treatment efficacy, i.e., a < 1, 
uncertainty about infection risk matters and gives room to belief distortion even 
when taking the treatment. Decreasing a increases the anticipated anxiety of 
Receiver leading to more optimistically biased beliefs, a higher 4 and, in turn, 
complicates persuasion for Sender for any severity s. Remark on figure 2.5b that t 
decreases sharply with a for a fixed s. In fact, one could show that as a decreases, 
t becomes closer and closer to uo for any ¢, meaning that the agency cannot 
achieve a substantially higher payoff than under full disclosure.” 

In the next subsection we extend out framework to the case of a continuous 
state space and linear preferences. We show that results in the finite state space 
case extend to this setting. We also highlight why we might expect persuasion to 


21One additional implication of this result is the following. Assume the true treatment efficacy is 
a but Receiver perceives the efficacy to be @ < a (e.g. because Receiver adheres to anti-vaccines 
movements or generally mistrusts the pharmaceutical industry). In that case, the doubts expressed 
by Receiver about the treatment efficacy makes him even more anxious which, in turn, makes 
belief distortion stronger and, thus, downplays the effectiveness of the agency’s information policy 
whatever is the severity of the disease. 
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be more effective in the context of risky investment decisions. 


2.5.2. Persuading a wishful investor 


A financial broker (Sender) designs reports about the return of some risky financial 
product to inform a potential client (Receiver). The return of the product is 
6 € @ = [6,6], where 6 < 0 < @. Returns are distributed according to the prior 
distribution go. Let F be the cumulative distribution function associated with jo 
and let us assume that zo admits a continuous and strictly positive density function 
f over [6, 6]. Receiver has some saved up money he is willing to invest and 
chooses action a € A = {0, 1}, where a = 0 represents the choice of non-investing 
in which case Receiver’s payoff is 0 and a = | represents investing, in which case 
Receiver’s payoff is the realized return 6. The broker is remunerated on the basis 
of a flat fee v > O that is independent of the true product’s profitability. Hence, 
Receiver’s payoff is u(a,@) = a@ while Sender’s payoff is v(a,@) = va for any 
(a,0)€ AXXO. 

Receiver forms motivated beliefs about the return of the financial product. By 
proposition 7 his equilibrium beliefs are given by 


(0) if is exp(p0) u(d0) < 1 
n(u)(®) = is exp(06) 1(d8) 


it [| exp(o0) w(d0) Si, * 
if exp(p)u(de) 


for any 4. € A(®) and any Borel set © C @, and, by corollary 3, his equilibrium 
behavior is given by 


a(n(u)) = 1 ie exp(p6) 1(d0) > 1 | 


Therefore, Sender’s indirect utility is equal to 


v(u) = v1 if exp(p@) u(d@) = 1 : 


for any 44 € A(®). To make the problem interesting, we assume that neither a 
Bayesian nor a wishful Receiver would take action a = 0 under the prior. That is, 
mh = [9 Ou(d0) < Oand & = [7 exp(p@)uo(d0) < 1.2 


22Tt is in fact always true that 7 <0 when x < 1. Hence, assuming m < 0 additionally to t < 1 
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Under these assumptions, remark that a signal structure o that induces a 
distribution t over posterior beliefs 4. matters for Receiver and Sender only through 
the distribution of exponential moments x = J exp(p@) u(d@) it induces. Let X 
be the space of such moments, that is, X = co(exp(e@)), where exp(p@) is the 
graph of the function 6 + exp(p@) for all 6 € [6, 6]. That is, X = [x,x] where 
x = exp(p@) and x = exp(p@). Let G be the prior cumulative distribution function 
over the random variable exp(p@) induced by F, that is 


“) 
a 


Gtx) = F| 


for any x € [x,x]. By standard arguments (Gentzkow and Kamenica, 2016), the 
problem of finding an optimal signal structure o reduces to finding a cumulative 
distribution function H that maximizes 


i; * v(x) dH(x) 


subject to 


[ nas [awa 


for every z € [x,x]. The solution to such a problem is well-known and can 
be found either using techniques from optimization under stochastic dominance 
constraints (Gentzkow and Kamenica, 2016; Ivanov, 2020; Kleiner, Moldovanu, 
and Strack, 2021) or linear programming (Kolotilin, 2018; Dworczak and Martini, 
2019; Dizdar and Kovaé, 2020). In our context, the optimal signal is a binary 
partition of the state space. That is, the broker reveals whether the return is above 
or below some threshold state. 


Proposition 9. There exists a unique 6“ € [0,0] verifying 


7 
<a i exp(p@) f (0) dé = 1 


and such that Sender pools all states 6 € [0,6] under the same signal s = 1, i.e., 
o(1|@) = 1 for all 6 € [0,0], and similarly pools all states 6 € [0,0] under 
the same signal s = 0. Hence, the probability of inducing action a = 1 for Sender 


is without loss. 
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is equal to 2 
0 
ie (10) (6) dd = 1 — F(6"). 
) 
Proof. See Ivanov (2020), Section 3. oO 


It is optimal for Sender to partition the state space at the threshold state making 
Receiver indifferent between investing or not at the prior. Such an information 
policy can intuitively be seen as the investment recommendation rule which 
maximizes the probability that Receiver invests given the prior distribution of 
returns F’. 

Using the exact same arguments as above, one can deduce that the probability 
of inducing action a = 1 when Receiver is Bayesian is given by 1 — F(6”) where 
6% is the unique threshold verifying the equation 


1 0 


Therefore, Sender is more effective at persuading a wishful Receiver if and only if 
OW ag? 


Proposition 10. It is always true that 0” < 9°. Hence, Sender is always more 
effective at persuading a wishful rather than a Bayesian investor. 


Proof. See appendix B.6. Oo 


The above result relates to proposition 8: buying the risky product is favored 
by the wishful investor since it is the action that yields both the highest possible 
payoff and the highest payoff variability. This example thus illustrates how the 
results in the finite state space case naturally extend to an infinite state space setting 
with linear preferences. It further helps explaining the pervasiveness of persuasion 
efforts in financial and betting markets, illustrating why some financial consulting 
firms seem to specialize in advice misconduct and cater to biased consumers. 


2.5.3. Public persuasion and political polarization 


A Sender (e.g., a politician, a lobbyist) persuades an odd-numbered finite group 
of voters N = {1,...,n} (e.g., a committee or parliamentary members) to adopt 
a proposal x € X = {0,1}, where x = 0 corresponds to the status-quo. The state 
space is binary, © = {0, 1}, and the audience uses only the information disclosed 
by Sender to vote on the proposal. Let a’ € A = {0, 1} be the ballot cast by voter i, 
where a’ = 0 designates voting for the status-quo. The proposal is accepted if it is 
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supported by a simple majority of voters. We assume Sender is only interested in 
the proposal being accepted, so her utility is v(x) = x. In contrast, any voter i € N 
has payoff function 


u'(x, 0) =x6B' +(1-x)(1-6)(1- B’) 


for any (x,0) € X x @ where f' € [0, 1] parametrizes the partisan preference of 
voter i. That is, all voters agree that the proposal should be implemented only when 
@ = 1, but they vary in how much they value the implementation of the proposal. 
We assume f' is symmetrically distributed around 1/2 in the population. Denote 
B = 1/2 the median voter’s preference. 

All voters form wishful beliefs and p is assumed homogeneous among the 
electorate. As a result, the direction as well as the magnitude of voters’ belief 
distortion depends only on their partisan preferences 6.23 By proposition 7, voter 
i’s belief under posterior yz € [0, 1] is given by 


Ll 


_ | @a-penpa-m 7 ese ® 
(HM, B') = as 
wexp(p 
pemesye daa MEH) 
where 
yu” (B') = exp(p(1 — B')) z= 


exp(p(1 — B')) + exp(pf') — 2° 
Remark that, similarly as in Alonso and Camara (2016), since the policy space is 
binary and voters do not hold private information there is no room for strategic 
voting in our model. Hence, citizen i’s voting strategy under belief 7(, 8’) is 
given by 

a(n(u,B')) =1{u = uw" (B)}. 


Due to the heterogeneity in £, there is always some level of belief polarization 
among wishful voters for any wu € ]0, 1[. Let us measure such polarization by the 
sum of the absolute difference between each pair of beliefs in the audience 


n-l on 
mu) =>) >) In(u.B') - n(u, 6) (2.6) 
i=1 j=i+l 


231t has been shown in psychology (Babad, Hills, and O’ Driscoll, 1992; Babad, 1995, 1997) as 
well as in behavioral economics (Thaler, 2020) that voters political beliefs are often motivated by 
their partisan orientation. 
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for any uz € [0, 1]. 


Proposition 11. Under Sender’s optimal information policy, the signal that leads 
to the implementation of the proposal also generates the maximum polarization 
among voters. 


Proof. See appendix B.5. Oo 


To build an intuition of why this is the case, let’s first note that, in our model, 
belief polarization and action polarization are closely related. Agents voting for 
the implementation of the proposal distort their beliefs upwards, whereas agents 
voting for the status quo distort their beliefs downwards. We can thus see that 
maximum belief polarization should be attained for some belief for which action 
polarization is maximized, that is, for some belief at which (n + 1)/2 agents are 
voting one way and the remaining (n — 1)/2 are voting another way. This is the 
case for any w € [wu (B""'), uw (BD. 

Due to sincere voting, the result of the election always coincides with the vote 
of the median voter under posterior belief 4. Accordingly, Sender’s indirect utility 
1S 

v(w) =1{u> w(p")}, 


for any uw € [0,1]. The optimal information policy for Sender is thus supported on 
{0, u (B’")} whenever pg € ]0, 1/2[, and on {429} whenever pio € Ju (8), 1[. 
The posterior ” (B”"), which leads to the implementation of the proposal, belongs 
to the interval [u”(8"~!), uW(p”*!)[ and, as such, is in the neighbourhood of 
the belief that maximizes polarization for any distribution of preferences. When 
such distribution is symmetric around the median voter, polarization is maximized 
exactly at the middle point in that interval, which is uw” (p’"). 

We illustrate proposition 11 below in section 2.5.3 in a setup with 3 voters. 
Following corollary 3, wishful thinking induces voters to switch from disapproval 
to approval at different Bayesian posteriors ” (B'). The optimal information policy 
t for Sender is the one that maximizes the probability of the median voter voting 
for the approval. That is, supp(t) = {0, u”(8”)} and uw" (B”) = 1/2 is induced 
with probability t = u”(8’") /u9 whenever pug € ]0, (87) [ and supp(t) = {110} 
whenever pig € Ju” (87), 1[. 

Let us now turn to polarization. First, it is quite easy to see in section 2.5.3 that 


m(u) = 2(n(u,8") — nu, B)) 


for any ys € [0, 1], as the distances to the median belief add up to n(y, B!)—n(y, 6°). 


105 


n(L) 
ee 


13 (u'" (B*)) IL cee 
mo (uW (B2)) fen 


miu" (B°)) poms re A , 


O Z > UU 
Cre) wey ws) 1 


Figure 2.6: Beliefs distortions in the electorate for p = 2, Bj = 1/4, B2 = 1/2 and 
83 = 3/4. Polarization equals x(j2) = 2(7(p, B!) — n(u, B°)) which is maximized 


at w'Y (6°) = 1/2. 


Thus, it suffices to check where (1, 8!) — n(, B*) is maximized. Quite naturally, 
polarization is maximized when the posterior belief induced by Sender is in 
between pw" (6°) and u“(B'). In particular, it is exactly maximized at the posterior 
belief (87) = 1/2 which is exactly the posterior belief Sender induces to obtain 
the approval of the proposal under her optimal policy. 

proposition 11 establishes that the intuition developed in this example is generally 
valid when the partisan preferences of voters are symmetrically distributed around 
the median. In other words, attempts by a rational sender to maximize the 
probability of approval induces, as an externality, maximal belief polarization 
among wishful voters. This result differs from the literature studying the possible 
heterogeneity of beliefs due to deliberate attempts at persuasion which tends to 
focus on polarization arising from differential access to information.?4 Our model 
gives an alternative mechanism to the rise of polarization, based on motivated 
beliefs: a sender can induce polarization involuntarily when her message is subject 
to motivated interpretations, and such polarization might be especially large 
whenever sender’s strategy involves targeting an agent with a median preference. 


24See Arieli and Babichenko (2019) for general considerations on the private persuasion of 
multiple receivers and see Chan, Gupta, Li, and Wang (2019) for an application to voting. 
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2.6. CONCLUSION 


In this paper we study optimal persuasion in the presence of a wishful Receiver. By 
modeling wishful thinking as a process that optimally trades-off gains in anticipatory 
utility with the cost of distorting beliefs, we characterize the correspondence 
between wishful and Bayesian beliefs, highlighting the particularities that such 
belief formation process entails. 

In particular, we show that wishful thinking impacts behavior, causing some 
actions to be favored in the sense that they are taken at a greater set of beliefs. This 
has important implications for the strategic design of information, as it adds some 
nuance on the way preferences and information determine behavior. Concretely, 
we show that, in the presence of wishful thinking, persuasion is more effective 
when it is aimed at inducing actions that are risky but can potentially yield a very 
large payoff and less effective when it is aimed at inducing more cautious actions. 
We use this model to illustrate why information disclosure seems less effective than 
expected at inducing preventive health behavior and more effective than expected 
at inducing dubious financial investments. Wishful thinking opens a channel for 
preferences to interfere in belief formation, raising the question of what kind of 
belief polarization could we observe in a population in which agents have access 
to the same information but vary in their preferences. We show in an application 
that an information designer interested in the approval of a proposal would, by 
optimally targeting the median voter in her choice of signal structure, induce, as an 
externality, maximum polarization among the electorate whenever the proposal is 
approved. 

Some studies already investigate the effects of wishful thinking on the outcomes 
of strategic interactions (see, Yildiz, 2007; Banerjee, Davis, and Gondhi, 2020; 
Heller and Winter, 2020). Further investigation on ways in which individual 
preferences might impact information processing and how these may impact social 
phenomena such as belief polarization in non-strategic and strategic settings seem 
to be promising paths for future research. 
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3. PrRiIcE DISCRIMINATION WITH 
REDISTRIBUTIVE CONCERNS: 


Abstract 


Consumer data can be used to sort consumers into different market 
segments, allowing a monopolist to charge different prices at each 
segment. We study consumer-optimal segmentations with redistributive 
concerns, 1.e., that prioritize poorer consumers. Such segmentations are 
efficient but may grant additional profits to the monopolist, compared 
to consumer-optimal segmentations with no redistributive concerns. 
We characterize the markets for which this is the case and provide 
a procedure for constructing optimal segmentations given a strong 
redistributive motive. For the remaining markets, we show that the 
optimal segmentation is surprisingly simple: it generates one segment 
with a discount price and one segment with the same price that would 
be charged if there were no segmentation. 


3.1. INTRODUCTION 


Consumers are continuously leaving traces of their identities on the internet, be it 
through social media activity, search-engine utilization, online-purchasing and so on. 
The vast amount of consumer data that is generated and collected has acquired the 
status of a highly-valued good, as it allows firms to tailor advertisements and prices 
to different consumers. In practice, the availability of consumer data segments 
consumers: observing that a given consumer has certain characteristics allows 
firms to fine-tune how they interact with people that share those characteristics. 
Adjusting how coarse-grained the information available about consumers is impacts 


!This chapter is a joint work with Daniel Barreto and Alexis Ghersengorin. We thank Eduardo 
Perez-Richet for his guidance on this project. We also thank Matthew Elliott, Jeanne Hagenbach, 
Emeric Henry, Emir Kamenica, Frédéric Koessler, Shengwu Li, Franz Ostrizek, Nikhil Vellodi, 
Colin Stewart their valuable feedbacks and comments at various stages of the project, as well as 
seminar audiences at Sciences Po, Paris School of Economics, University of Konstanz, CUNEF, 
University of Rome “Tor Vergata,” University of Barcelona, University of Amsterdam and WU 
Vienna for helpful discussions. All remaining errors are ours. This project has received funding 
from the European Research Council (ERC) under the European Union’s Horizon 2020 research 
and innovation programme (grant agreement 850996 — MOREV and 101001694 —- IMEDMC). 
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how they will be segmented, what sort of digital market interactions they will have 
and what prices they will pay. This suggests room for regulatory oversight. 

As shown by Bergemann, Brooks, and Morris (2015), consumer segmentation 
and price discrimination can induce a wide range of welfare outcomes. It can not 
only be used to increase social surplus—by creating segments with prices that 
allow more consumers to buy—, but can also be performed in a way that ensures 
that all created surplus accrues to consumers — that is, that maximizes consumer 
surplus. This is done by creating segments that pool together consumers with high 
and low willingness to pay, thus allowing higher willingness to pay consumers to 
benefit from lower prices. However, an important aspect of price discrimination 
that remains overlooked by the literature is its distributive effect: since different 
consumers pay different prices, this practice defines how surplus is distributed 
across consumers, raising questions about how it can benefit poorer consumers 
relative to richer ones. Indeed, if willingness to pay and wealth are positively 
related, segmentations that maximize total consumer surplus tend to benefit richer 
consumers. 

In this paper we provide a normative analysis of the distributive impacts of market 
segmentation. Our aim is to study how this practice impacts different consumers 
and how it should be performed under the objective of increasing consumer welfare 
while prioritizing poorer consumers. Our results draw qualitative characteristics of 
segmentations that achieve this goal, which can be used to inform future regulation. 
Importantly, our analysis also shows that the prioritization of poorer consumers 
can be inconsistent with the maximization of total consumer surplus: raising the 
surplus of poorer consumers may only be possible while granting additional profits 
to the producer, at the expense of richer consumers. 

We consider a setting in which a monopolist sells a good on a market composed 
of heterogeneous consumers, each of whom can consume at most one unit and 
is characterized by their willingness to pay for the good. A social planner can 
provide information about consumers’ willingness to pay to the monopolist. The 
information provision strategy effectively divides the aggregate pool of consumers 
into different segments, each of which can be priced differently by the monopolist. 
The social planner’s objective is to maximize a weighted sum of consumers’ 
surplus. As in Dworczak, Kominers, and Akbarpour (2021), we consider weights 
that are decreasing on the consumer’s willingness to pay, capturing the notion of a 
redistributive motive under the assumption that consumers with higher willingness 
to pay are on average richer than those with lower willigness to pay. 

We first establish that optimal segmentations are Pareto efficient, such that 
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satisfying a redistributive objective does not come at the expense of social surplus. 
Bergemann et al. (2015) show that, in the absence of redistributive concerns, 
consumer-optimal segmentations do not strictly benefit the monopolist: all of the 
surplus created by the segmentation accrues to consumers. In contrast, we show that 
once redistributive preferences are considered, consumer-optimal segmentations 
may imply additional profits to the monopolist. This happens because increasing 
the surplus of poor consumers is done by pooling them with even poorer consumers, 
such that they can benefit from lower prices. In doing so, richer consumers become 
more representative in other segments, which might increase the price they pay. 
We characterize the set of markets for which this is the case and denote them as 
rent markets. For no-rent markets, on the contrary, any redistributive objective 
can be met while still maximizing total consumer surplus. In this case, our 
analysis selects one among the many consumer-optimal segmentations established 
by Bergemann et al. (2015). These insights are illustrated through a three-type 
example in section 3.3. 

Our analysis also provides insights on how to construct optimal segmentations. 
We show that, in no-rent markets, consumer-optimal segmentations with redis- 
tributive concerns exhibit a stunningly simple form, simply dividing consumers 
into two segments: one where the price is the same that would be charged under 
no segmentation and one with a discount price. In rent markets, we show that 
consumer-optimal segmentations under sufficiently strong redistributive prefer- 
ences divide consumers into contiguous segments based on their willingness 
to pay, having consumers with the same willingness to pay belong to at most 
two different segments. This allows us to construct a procedure that generates 
consumer-optimal segmentations under strong redistributive preferences, which is 
discussed in section 3.4.2. 


Related literature. Third-degree price discrimination and its welfare effects 
are the subject of an extensive literature. Early analysis (Pigou, 1920; Robinson, 
1933) and subsequent development (Schmalensee, 1981; Varian, 1985) considered 
exogenously fixed market segmentations and studied conditions under which such 
segmentations would increase or decrease total surplus. 

This literature has recently undergone a transformation, prompted by both 
technical innovations in microeconomic theory and the change in character of the 
practice of price discrimination brought about by the ascent of digital markets. 
Recent developments incorporate an information design approach to study the 
welfare impacts of third-degree price discrimination over all possible market 
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segmentations, rather than taking a segmentation as exogenously fixed. Bergemann 
et al. (2015) analyze a setting with a monopolist selling a single good and 
characterize attainable pairs of consumer and producer surplus, showing that any 
distribution of total surplus over consumers and producer that guarantee at least 
the uniform-price profit for the producer is attainable. In particular, they show 
that there are typically many consumer-optimal segmentations of a given market. 
Their analysis has been extended to multi-product settings by Haghpanah and 
Siegel (2022a,b) and to imperfect competition settings by Elliott, Galeotti, Koh, 
and Li (2021) and Ali, Lewis, and Vasserman (2022). Hidir and Vellodi (2020) 
study market segmentation in a setting where the monopolist can offer one from a 
continuum of goods to each consumer, such that consumers, upon disclosing their 
information, face a trade-off between being offered their best option and having to 
pay a fine-tuned price. Finally, Roesler and Szentes (2017) and Ravid, Roesler, and 
Szentes (2022) study the inverse problem of information design to a buyer who is 
uncertain about the value of a good. Our paper differs from these by focusing on 
how surplus is distributed across consumers, and by studying consumer-optimal 
segmentations when different consumers are assigned different welfare weights. 
We show that, once distributional preferences are taken into account, optimal 
segmentations might not coincide with consumer-optimal segmentations under 
uniform welfare weights. When they do, our analysis selects one among the many 
direct consumer-optimal segmentations established in Bergemann et al. (2015). 

Our paper also dialogues with a recent literature on mechanism design and 
redistribution, most notably with Dworczak et al. (2021) and Akbarpour, Dworczak, 
and Kominers (2020), who study the design of allocation mechanisms under 
redistributive concerns; and Pai and Strack (2022), who study the optimal taxation 
of a good with a negative externality when agents differ on their utility for 
the good, disutility for the externality and marginal value for money. A key 
difference in the results obtained in these papers and ours is that, in their settings, 
redistributive mechanisms are not pareto-efficient: redistribution implies some loss 
in social surplus. This is not the case in our paper, where optimal redistributive 
segmentations always maximize total surplus. 

Finally, our paper dialogues with Dube and Misra (2022), who study experi- 
mentally the welfare implications of personalized pricing implemented through 
machine learning. The authors find a negative impact of personalized pricing 
on total consumer surplus, but note that a majority of consumers benefit from 
price reductions under personalization, pointing that under some inequality-averse 
weighted welfare functions, data-enabled price personalization might increase 
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welfare. Their paper shows experimentally how the implementation of market 
segmentations aimed at maximizing profits might generate, as a by-product, the 
redistribution of surplus among consumers. Our paper, on the other hand, shows 
theoretically how consumer-optimal redistributive segmentations might grant 
additional profits for the firm. 


3.2. MOoDEL 


3.2.1. Setup 


A monopolist (he) sells a good to a continuum of mass one of buyers, each of whom 
can consume at most one unit. We normalize the marginal cost of production of the 
good to zero. The consumers privately observe their type v, which corresponds to 
their willingness to pay for the good. We assume that the consumers’ type can take 
a finite number K of possible values V = {vj,...,vx}, where 0 < vy <---+< vx. 
We let K := {1,...,K}. A market wu is a distribution over the valuations. We 
denote the set of all possible markets: 


M:= A(V) = {we BK |S" we = Land yx > 0 for all k € Kh. 
kek 


Price vx 1s optimal for market u € M if it maximizes the expected revenue of the 
monopolist when facing market jy, that 1s: 


K K 
VE) Hi > vy) His Vi EK. 
=k =j 


Let M; denote the set of markets where price v; is optimal. It is given by: 
K 
M, = {H € M | ve € arg max vi aif, 
vyeEV j=i 


for any k € K. In the remaining of the paper we will hold an aggregate market 
fixed and denote it by y° € M. 


Segmentation. The consumers’ types are perfectly observed by a social planner 
(she) who can segment consumers, that is, sort consumers into different sub-markets. 
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The set of possible segmentations of an aggregate market ju” is given by: 
Eq) = fore AM)| fo wotdu) =x", 
A(M) 


Formally, a segmentation is a probability distribution on M which averages to the 
aggregate market y°. The requirement that the different segments generated by a 
segmentation average to the aggregate market ensures that the segmentation simply 
sorts existing consumers into different groups, without fundamentally altering the 
aggregate composition of consumers in a market. This requirement is akin to 
the Bayes Plausibility condition that is typically used in the Bayesian Persuasion 
literature (Kamenica and Gentzkow, 2011). 

Given a segmentation o-, the monopolist can price differently at each segment 
jz in the support of a. A pricing rule is a mapping p: M — V. As will become 
clear in problem 3.4, segments with more than one optimal price play a key role in 
our results. We focus on the following pricing rule: 


K 
p() = min 4 arg max vz y Hi 
kek i=k 


At each segment, the monopolist charges the smallest price among all optimal 
prices in that segment. This pricing rule makes the objective of the social planner 
(stated in equation (P)) upper semi-continuous and ensures the existence of an 
optimal segmentation?. 


Social objective. The social planner’s objective is to maximize a weighted sum 


of consumers’ surplus, with positive weights A €¢ R*. Each dimension A, of the 


vector A corresponds to the marginal contribution to social welfare of consumers 
of type vz. The surplus of a consumer of type v, in market py is given by: 


Ux(m) = max {0, vz — p(u)}. 


The weighted consumer surplus on market yu is given by: 


W(u) = » Ax Uk Ux(M), 
kek 


2Although technically important, this pricing rule does not impact our results qualitatively. 
Indeed, any joint distribution of consumers and prices that can be induced by the social planner 
under this pricing rule could be approximated arbitrarily well by a social planner facing a monopolist 
who selects among optimal prices in some other way. 
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for any 4 € M. Hence, for any aggregate market y°, the social planner’s objective 
is given by the following maximization program: 


max W(u) o (dy). (P) 
weX(u) JA(M) 


Given an aggregate market y°, a segmentation 7 € &(y°) is optimal if it solves 
(P). We focus on welfare weights that are decreasing on the consumer’s willingness 
to pay, such that Az, > Ax for any k < k’ < K — 1, and say that the social planner 
has redistributive preferences if the inequality holds strictly for some k, k’ € K. 
Under the assumption that consumers with lower willingness to pay are on average 
poorer than consumers with higher willingness to pay, this amounts to attributing a 
greater weight to surplus accruing to poorer consumers?. 


Efficiency. Every consumer has a value for the good that is strictly greater than 
the marginal cost of production. Hence, social surplus is maximized when every 
consumer buys the good. We say that a market yw is efficient if every consumer can 
buy the good, that is, if the lowest optimal price for the seller at that market allows 
everyone to consume: p(f) = min supp(). For a given market uw and Pareto 
weights A, the maximum feasible social surplus is thus given by 


S(u) = » AKLKV k- 
kek 


Note that a segmentation of ~ achieves s(j) if and only if it is efficient. A 
segmentation o is efficient if it is only supported on efficient markets. 


Informational Rents. The profit of the monopolist at market yu is given by: 


mu) =P(L) >) Mk, 


kECp(y) 


where C, = {k € K| vx = p}. The profit of the monopolist under segmentation o 
is given by: 


I(c) = | (41) o (dys) 
A(M) 


Segmenting the aggregate market can only weakly increase the expected profit of the 
monopolist relative to no segmentation. Therefore, we always have II(o) > 2(1°) 
for any 7 € X(u°). We say that some segmentation o grants a rent to the 


3We follow here the approach by Dworczak et al. (2021). 
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monopolist whenever II() > (y°). 


Uniformly Weighted Consumer-Optimal Segmentations. If 2, = Ay > 0 
for all k,k’ € K, program (P) corresponds to the maximization of the total 
consumer surplus over all possible segmentations. A segmentation that solves this 
optimization problem is named uniformly weighted consumer-optimal. As shown 
in Bergemann et al. (2015), uniformly weighted consumer-optimal segmentations 
are (1) efficient—and hence achieve the maximum feasible social surplus—, and 
(ii) do not grant the monopolist any rent. For an interior aggregate market u°, there 
exists infinitely many uniformly weighted consumer-optimal segmentations. In 
section 3.4.3, we characterize the set of aggregate markets for which consumer- 
optimal segmentations with redistributive preferences are also uniformly weighted 
consumer-optimal, thus providing a natural way to select among these segmentations 
for such markets. 


3.2.2. Discussion of the model 


Information provision as segmentation. In digital markets, information provi- 
sion about consumers often occurs through the assignment of labels to different 
consumers. Indeed, one could think of a model in which the social planner adopts 
a signal structure £: V — A(L), where L is a set of labels. The meaning of each 
label is then pinned down by the social planner’s strategy, and the monopolist 
optimally chooses different prices for consumers with different labels. 

Such a model is equivalent to ours. Indeed, any segmentation o € &(y°) can 
be implemented by some signal structure @, and any signal structure € implements 
some segmentation o € X(°). The approach of working directly in the space of 
feasible distributions over markets rather than in the space of labeling strategies is 
standard in the information design literature (Kamenica and Gentzkow, 2011). 


Continuum of consumers. While we consider a setting with a continuum of 
consumers, our model is equivalent to one in which there is a discrete number 
of consumers, with types independently distributed according to °. Under this 
interpretation, the social planner commits ex-ante to an information structure o to 
inform the monopolist, which defines the distribution of posterior beliefs yz that 
the monopolist will form upon facing each consumer. 
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3.3. THREE VALUES CASE 


In this section, we illustrate our model and some of the results from the following 
sections in the simple three values case. 


Setup. Let’s consider three types, v; = 1, v2 = 2 and v3 = 3. We can conveniently 
depict the set of markets M as the two-dimensional unit simplex (see Mas-Colell, 
Whinston, and Green, 1995, p.169). It is depicted in figure 3.1, where each vertex 
of the simplex represents a degenerate market on a value v € V, denoted by the 
Dirac measure 6,. 

In the left panel of figure 3.1 are drawn the three different price regions M,, 
Mp) and M3. The points in each of the regions correspond to the markets for which 
each of the different prices {1, 2,3} are optimal for the monopolist+. The border 
between two adjacent regions represents markets for which there are more than 
one optimal price. Given pricing rule p, the price charged in such markets is the 
lowest amongst the optimal. 

In the right panel, an aggregate market ° = (0.3,0.4,0.3) is represented, 
which is in the interior of the region M2, meaning that v2 is a strictly optimal price 
for u°. Two possible segmentations are depicted: the one in green dashed lines, 
that segments yu into the three degenerate markets (thus implementing first-degree 
price discrimination); and the one in black dotted lines, that segments ju° into three 
segments: py’, containing types all three types and being priced v1; yu”, containing 
only types v2 and v3 and being priced v2; and y’””, containing all three types and 
being priced v3. 

Any splitting of ° into a set of points S C M represents a feasible segmentation, 
as long as ° € co(S)5. A segmentation is optimal given weights (A), 22, A3), 
with 2, > Az = Az, if it maximizes the sum of weighted consumer surplus over 
all segments generated. Note that consumers of type v; never get any consumer 
surplus (since the monopolist never charges a price lower than their willingness 
to pay), such that the optimal segmentation trades-off surplus obtained by types 
v2 and v3. We will focus, without loss of generality, on direct segmentations, 1.e. 
segmentations in which there is not more than one segment with a given price. 


General properties of optimal segmentations. A first step for finding the 
optimal segmentation of j° is to observe that any optimal segmentation must be 


4Formally, for any k, My = cl(p~!(v,)), where cl(S) denotes the topological closure of a 
generic set S. 
5For any set S, co(S) denotes the convex hull of $ 
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Figure 3.1: The Simplex representing M and two feasible segmentations. 


efficient. To see that, consider the black dotted segmentation in the right panel of 


figure 3.1. Both wv’ and yw” are efficient, since all the consumers in these segments 


LA 


are able to buy the good. The remaining segment y’”, however, is not efficient, as 


it contains some consumers with type v; and v2 who are not able to consume under 


that segment’s price. One could solve that by re-segmenting v’” in the following 


way: creating a segment yl)” containing all of the types v) and v2 and some of the 


types v3 that used to belong to yw”, and another segment 63 containing only the 
” 


b 
that this segment will have price v;. That way, both of the resulting segments will 


remaining types v3. Note that the amount of type v3 in 7” can be adjusted to ensure 
be efficient. Furthermore, this re-segmentation of y’”’ unambiguously increases 
consumer welfare, since it has no impact on the welfare of consumers in yu’ and yw” 
and (weakly) increases the surplus of every consumer previously belonging to p’”. 

Indeed, a welfare-increasing segmentation can be performed to any inefficient 
market. This narrows down the search for an optimal segmentation, as we know 
that it must be supported only on efficient segments. The left panel of figure 3.2 
depicts, in orange, the efficient markets. These are: the degenerate market 63; the 
set of markets in region M) that have no consumer with value 1; and the entire 
region M). 

We can further note that, in an optimal segmentation, the segment with price v1 
must not belong to the interior of region M,. To see that, consider the right panel 
of figure 3.2. In it are depicted two segmentations: o,, which splits yz° into jz, and 
u’, and op, which splits ° into yz, and yp’. Segmentation 0 is always preferred 
Over 0, for two reasons. First, up» has a higher share of types v2 and v3 than fg. 
Since these are the only two types that are extracting surplus on the segment whose 
price is v;, having a higher share of them increases the social planner’s objective. 
Second, ju, is “closer” to u°, which means that o(j1)) > Oa(Ua). That means 
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Figure 3.2: Efficient markets and segmentations. 


that segmentation oy is able to include a bigger mass of consumers in the segment 
where they will extract the largest surplus, thus also increasing the social planner’s 
objective. 

The argument outlined above illustrates how every segmentation generating a 
segment on the interior of region M; must be dominated by some segmentation 
that instead generates a segment on the boundary of regions M; and M2. This 
amounts to saying that any optimal segmentation must include a segment in which 
the monopolist is indifferent between charging price v; or charging some other 
price. The intuition for that is simple: if the monopolist strictly prefers to charge 
price v; in that segment, then there’s still room for “fitting” other types in that 
segment in a Pareto improving way. 


Uniformly weighted consumer-optimal segmentations. We begin by consider- 
ing the case where A; = A = A3. The left panel of figure 3.3 depicts three different 
segmentations, 0, 0} and o;,, each of them generating one segment with price v; 
and one segment with price v2. All of these three segmentations are uniformly 
weighted consumer-optimal. This follows from the fact that 1) they maximize total 
(consumer + producer) surplus, since they are all efficient, and 11) the monopolist 
does not get any of the surplus that is created from the segmentation °. 

Indeed, there are uncountably many uniformly weighted consumer-optimal 
segmentations of 1°. All of these are equivalent in that they maximize total 


©One way of seeing this is as follows: A decision-maker strictly benefits from observing a piece 
of information if, as a result of this observation, she is able to make better decisions than she would 
have made absent this information. In our setting, this amounts to the monopolist being able to, as 
a result of the segmentation, choose different prices than the uniform price, at markets in which 
these different prices are strictly preferred over the uniform price. Since price vz belongs to the set 
of optimal prices in every segment generated by the segmentations in figure 3.3, the monopolist 
does not strictly benefit from them. 
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Figure 3.3: Uniformly Weighted Consumer-Optimal Segmentations. 


consumer surplus, but they are not equivalent in how they distribute such surplus 
across consumers. This can be seen in the right panel of figure 3.3: while the three 
segmentations of the left panel induce the same profit for the monopolist and the 
same total consumer surplus, 0 induces greater surplus for consumers of type v2 
than the other segmentations. This is so because, among the segments priced at v1, 
Mc is the one that includes the most consumers of type v2, who can then benefit 
from a low price. 


Consumer-Optimal segmentations under redistributive preferences. Let’s 
now consider the case when A, > A3. Among the segmentations depicted in the 
left panel of figure 3.3, segmentation o; is now preferred over a, and oy. But is it 
optimal? One way of increasing the surplus of consumers of type v2 further is to 
exchange consumers between the two segments generated by o,: by exchanging 
the remaining consumers of type v3 that are present in uw“ against some of the 
consumers of type v2 present in yu’, one can increase the amount of types v2 
that pay a low price. While this exchange increases the surplus of types va, it 
dramatically decreases the surplus of types v3, since now there are sufficiently 
many of them in segment yw” for the monopolist to want to increase the price 
charged at that segment. This would lead to a segmentation that is no longer 
uniformly weighted consumer-optimal: the price increase in segment p’“ would 
cause some of the surplus that was previously captured by consumers of type v3 to 
now be granted to the monopolist instead. The result below establishes when this 
exchange is desirable from the social planner’s perspective. 


Result 1. Let u° = (0.3, 0.4, 0.3). Then, the two following assertions are satisfied: 
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Figure 3.4: Optimal Segmentations with Redistributive Preferences. 


(i) If the inequality 
75) 2 V3+V2-Vq1 


A3 v2 -V{ 
is satisfied, then the consumer-optimal segmentation under redistributive 
preferences is also uniformly weighted consumer-optimal and generates two 
segments. One supported on {v1,V2,V3} and the other one supported on 
{v2, v3}. This segmentation is represented in the left panel of figure 3.4; 

(ii) If the inequality 

Ar V3+V2-Vq1 

a3 7 v2— V1 
is satisfied, then the consumer-optimal segmentation under redistributive 
preferences is not uniformly weighted consumer-optimal and generates three 
segments. The first one is supported on {v1, v2}, the second is supported on 
{v2, v3}, and the third is supported on {v3}. This segmentation is represented 
in the right panel of figure 3.4. 


An important consequence of this result is that if the social planner’s preferences 
are sufficiently redistributive, meaning that 2, is sufficiently greater than 3, the 
optimal segmentation might give a rent (i.e. an additional profit) to the monopolist. 
By packing more consumers with lower types together, the social planner also 
makes higher types more distinguishable, thus allowing the monopolist to raise their 
prices. The above example illustrates the main argument of the paper: while market 
segmentation can redistribute surplus without any loss of efficiency, sometimes 
raising the surplus of poorer consumers can only be done if some of the surplus 
from richer consumers is granted to the monopolist. 

However, not every aggregate market requires the granting of rents to the 
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monopolist in order to satisfy redistributive objectives. Consider for instance the 
aggregate market ° = (0.2, 0.65, 0.15), represented in the left panel of figure 3.5. 
The optimal segmentation of this market given any preferences 27 > A3 is the one 
depicted in the figure: it always generates a segment with {v,, v2} and another 
one with {v2, v3}, and this segmentation is always uniformly weighted consumer- 
optimal. On this aggregate market, satisfying a redistributive objective never 
requires granting rents to the monopolist because it contains sufficiently many 
consumers of type v2, such that even after pooling as many as possible of them 
with types v; in segment yp, there are still sufficiently many types v2 left to ensure 
that types v3 will not be over-represented in segment yu’. 

The result below characterizes the set of aggregate markets that, under a 
sufficiently strong redistributive motive, would require granting rents to the 
monopolist. We denote this set as the rent region. 


Result 2. The rent region is give by 


int(co( {53, 415, w2, n°3}))). 


This result is illustrated in the right panel of figure 3.5, where the rent region is 
depicted in orange. Equivalently, the complement of this set denotes the aggregate 
markets for which any redistributive objective can be met without granting rents 
to the monopolist — that is, while maximizing total consumer surplus—. We 
call this set the no-rent region. The following section generalizes the insights 


63 63 


Rent region 


02 04 


Figure 3.5: Rent Region. 


presented through this example. Section 3.4.1 generalizes the fact that optimal 
segmentations are efficient and include discount segments supported at markets at 
which the monopolist is indifferent between more than one price, while section 3.4.2 
establishes properties of optimal segmentations when the redistributive motive 
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is sufficiently strong and shows how to construct optimal segmentations in this 
case. Finally, section 3.4.3 characterizes generally the no-rent and rent regions 
and shows that optimal segmentations for markets belonging to the no-rent region 
exhibit a very simple form, with only one discount segment and one uniform price 
segment. 


3.4. OPTIMAL SEGMENTATIONS 


We now turn to the analysis of the general case. In section 3.4.1 we derive general 
properties of optimal segmentations — that is, characteristics that are present in 
optimal segmentations given any decreasing welfare weights 2. Section 3.4.2 then 
constructs optimal segmentations under strongly redistributive preferences: when 
the weight assigned to lower types is sufficiently larger than the weight assigned 
to higher types. Finally, we characterizes the set of aggregate markets for which 
satisfying a redistributive objective might require granting additional profits to the 
monopolist in section 3.4.3. 


3.4.1. General properties 


Efficient segmentations. Our first result echoes our analysis of efficiency in the 
three-value case and establishes that 1) we can always restrict ourselves to efficient 
segmentations—as long as the weights are non-negative; 11) if the weights are all 
strictly positive (1.e. if Ax > 0 under our assumption of decreasing weights), only 
efficient segmentations can be optimal. 


Proposition 12. For any aggregate market p° and any weights A € RX (not 
necessarily decreasing), there exists an efficient optimal segmentation of 1°. 
Furthermore, if every weight is strictly positive, then any optimal segmentation is 
efficient. 


Proof. This result is a direct consequence of Proposition 1 in Haghpanah and 
Siegel (2022b)—which itself follows from the proof of Theorem | in Bergemann 
et al. (2015). Oo 


This result relies on the fact that any inefficient market can be segmented in a 
Pareto improving manner, that is, in a way that weakly increases the surplus of all 
consumers. Hence, as long as the social planner does not assign a negative weight 
to any consumer, there must be an efficient optimal segmentation. Proposition 12 
thus implies that segmenting in a redistributive manner never comes at the expense 
of efficiency. 
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Direct segmentations. A segmentation o is direct if all segments in 0 have 
different prices, that is, if for any uu, yu’ € supp(o), p(w) # p(w’). Our next lemma 
shows that it is without loss of generality to focus on direct segmentations. 


Lemma 12. For any aggregate market ° and any segmentation o € &(°), there 
exists a direct segmentation o’ € &(°) such that, 


i W(u) o-(dps) = | W(u) o"(du). 
A(M) A(M) 


Proof. See appendix C.1.1 Oo 


We further show that there always exists an optimal and direct segmentation 
that is only supported on the boundaries of price regions {Mx} xex. Let K° := {k € 
K | vz € supp(u°)} be the set of indices of consumers’ types supported by p”. 


Lemma 13. For any aggregate market y° that is not efficient, there exists an 
optimal direct segmentation supported on boundaries of sets {Mx} exo: 


Proof. See appendix C.1.2. Oo 
This result implies that we can restrict without loss of generality to finitely 
supported segmentations. 
3.4.2. Strongly redistributive social preferences 


In this section, we derive some characteristics of the optimal segmentation when 
the social planner’s preferences are strongly redistributive, that is, when the weights 
A are strongly decreasing on the type v. 


Definition 7. The weights A are x-strongly redistributive if, forany k < k’ < K-1, 


Ak 
qe = Ke 


That is, a social planner exhibits x-strongly redistributive preferences (k-SRP) 
if the weight she assigns to a consumer of type vx is at least x times larger than the 
weight she assigns to any consumer of type greater than vx. 

Let us define the dominance ordering between any two sets. 


Definition 8. Let X,Y C R. The set X dominates Y, denoted X =p Y, if for any 
x€Xandanyy €Y,x > y.7 


We can now state the main result of this section. 


7Note that this definition of dominance is stronger than the strong set order in Topkis (1998). 
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Figure 3.6: Structure of optimal segmentations under strong redistributive prefer- 
ences. 


Proposition 13. For any aggregate market 1° in the interior of M, there exists k 
such that if A’s are k-strongly redistributive, then for any optimal direct segmentation 
ao € X(p°) and any markets 1, uw’ € supp(o), uw # pl’: either supp() >p supp(w’) 
or supp(H’) 2p supp(1). 


Proof. See appendix C.2.1 Oo 


The result stated above establishes that, when the social planner’s preferences 
exhibit a sufficiently strong taste for redistribution, optimal segmentations divide 
the type space V into contiguous overlapping intervals, with the overlap between 
any two segments being composed of at most one type. The following corollary is 
a direct consequence of proposition 13: 


Corollary 5. For any aggregate market 1° in the interior of M, there exists x such 
that if A’s are k-strongly redistributive, then for any optimal direct segmentation 
oa € X(p°), any market fz € supp(o) and any k such that min{supp(j)} < vz < 
max{supp(j1)}: o(w) ue = Le. 


The above result states that any segment u belonging to a segmentation that 
is optimal under strong redistributive preferences contains all of the consumers 
with types strictly in-between min{supp(y)} and max{supp()}. Together with 
proposition 13, it implies that, under x-SRP optimal segmentations, every consumer 
type v will belong to at most two segments: either it will belong to the interior 
of the support of a segment yu, such that all consumers of this type have surplus 
v — min(supp()), or it will be the boundary type between two segments yz and 
wu’, such that a fraction of these consumers (those belonging to segment jz) gets 
surplus v — min(supp(,2)) and the rest gets no surplus. The structure of optimal 
segmentations under strong redistributive preferences is illustrated in figure 3.6. 

These results, along with proposition 12, completely pin down the x-SRP 
optimal direct segmentation. One can construct it by employing the following 
procedure, presented as follows through steps: 


¢ Step i) Start by creating a segment — call it «4, — with all consumers of type 
VI. 


fe 


e Step ii) Proceed to including in zz, successively, all consumers of type v2, 
then all of the types v3, and so on. From proposition 12 we know that uz, must 
be efficient, meaning that we must have p(,) = v1. As such, the process of 
inclusion of types higher than v; must be halted at the point in which adding 
a new consumer in (4, would result in v; no longer being an optimal price in 
this segment. We denote as v(q\p) the type that was being included when the 
process was halted. 


Step iii) Create a new segment — call it 4, — with all of the remaining types 


V(alb)- 


Step iv) Proceed to including in pp, successively, all of consumers of type 
V(a|b)+1, then all of the types v(qjp)42, and so on. Halt this process at the point 
in which adding a new consumer in fy would result in v(qjp) no longer being 
an optimal price in this segment. We denote as v,p),) the type that was being 
included when the process was halted. 


° Step v) Create a new segment with all of the remaining types v(,).). Repeat 
the process described in the last steps until every consumer has been allocated 
to a segment. 


3.4.3. Optimal segmentations and informational rents 


This section explores the question of when does an optimal segmentation maximize 
total consumer surplus or, conversely, when it grants a rent for the monopolist. 

Say that an aggregate market jz” belongs to the rent region if there exists some kK 
such that if the social planner has x-strongly redistributive preferences, the optimal 
segmentation grants a rent to the monopolist. Conversely, denote no-rent region 
the set of aggregate markets for which any optimal segmentation with redistributive 
preferences also maximizes total consumer surplus. 

Before we characterize the rent and no-rent regions, we define a particular 
segmentation, which we will call o%*: 


Definition 9. Let 1° be an aggregate market with uniform price vy. Call cN® the 
segmentation that splits 4° into two segments pw’ and kv", such that: 


0 0 
uo ue 
1 = (BB. ph 0--0} 
0 0 
Ht Mr 
FOO, es a ; 
. | Bw To l-o 
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Figure 3.7: Segmentation 7¥*, 


where ws = vi/Vu. Mi, = (Ho — oMs)/(1— 7) and & = (vu S15) uh) (vu — v1). 


Segmentation o-%* 


is very simple and generates only two segments: one pooling 
all the consumers who would not buy the good on the unsegmented market (those 
with type lower than v,,) and another one pooling all the consumers who would 
buy the good on the unsegmented market (those with type higher than v,,). Under 
segmentation o-‘*, the only consumer type that gets assigned to two different 


segments is v,. 


Proposition 14. An aggregate market 1° belongs to the no-rent region if and only 
if oN® is an efficient segmentation of °. 


Proof. See appendix C.3.1. Oo 


Proposition 14 establishes a simple criterion that defines whether an aggregate 
market belongs to the no-rent region: it suffices to check if, under 7%, p(y) = vq 
and p(y") = v,. Whenever this is not true, the aggregate market belongs to the 
rent region. 


Corollary 6. Consider an aggregate market 1°. If oN® is not an efficient 
segmentation of 1°, then there exists x such that, if welfare weights A are x-strongly 
redistributive, any optimal segmentation grants a rent to the monopolist. 


The intuition for the results above is as follows. A market belongs to the no-rent 
region if, given any redistributive preferences, its optimal segmentation maximizes 
total consumer surplus. On one hand, we know from proposition 13 that, under 
strong redistributive preferences, optimal segmentations divide the type space into 
overlapping intervals, with the overlap between two segments being comprised of at 
most one type. On the other hand, we have as a necessary and sufficient condition 
for total consumer surplus to be maximized that the segmentation is 1) efficient 
and ii) the uniform price v, is an optimal price at every segment generated by this 
segmentation. Condition 1) ensures that total surplus is maximized, while condition 
11) ensures that producer surplus is kept at it’s uniform price level, meaning that all 
of the surplus created by the segmentation goes to consumers. Since condition 
11) can only be satisfied if type v,, belongs in the support of all segments, we get 
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that the conditions for optimality under strong redistributive preferences and for 
total consumer surplus to be maximized can only be simultaneously met by a 
segmentation that only generates two segments, with the overlap in the support of 
both segments being comprised of v,. 

Such a segmentation indeed maximizes total consumer surplus if it is efficient 
and if v, is an optimal price in both segments. This is the case if v; and v, are both 
optimal optimal prices on the lower segment, and if v, is an optimal price in the 


NR is the only segmentation that can potentially 


upper segment. Segmentation 7 
satisfy all of these conditions at once, as it includes in the lower segment the 
exact proportion of types v,, that would make the monopolist indifferent between 


NR 


charging a price of v; or v,. As such, segmentation 0" maximizes total consumer 


surplus if and only if it is efficient. 


Corollary 7. [f an aggregate market ° belongs to the no-rent region, then NP is 


its only direct consumer-optimal segmentation under any redistributive preferences. 


This result establishes that, for markets in the no-rent region, optimal segmen- 
tations have an extremely simple structure: they only generate a discount segment 
with price v1, pooling all the types who would not consume under the uniform price 
and some of the types v,,, and a residual segment with price v,,, containing all of the 
remaining consumers. Furthermore, this segmentation must be optimal under any 
decreasing welfare weights 2. As such, this result selects for the markets belonging 
to the no-rent region one among the many uniformly weighted consumer-optimal 
segmentations that were outlined in Bergemann et al. (2015). 

Due to the structure of segmentation o-*, all of the surplus that is generated 
by the segmentation is given to consumers with types below or equal to v,, all of 
which get the maximum surplus they could potentially get. Since it is impossible to 
raise the surplus of any type below v,,, and impossible to raise the surplus of types 
above v,, without redistributing from lower to higher types, this segmentation must 
be optimal whenever the weights assigned to different consumers are (weakly) 
decreasing on the type. 

The results in this section establish that there are essentially two types of 
markets: those for which redistribution can be done only within consumers, while 
keeping total consumer surplus maximal, and those for which increasing the surplus 
of lower types past a certain point necessarily decreases the total pie of surplus 
accruing to consumers and grants additional profits to the monopolist. 
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Appendices 


A. MATHEMATICAL APPENDIX FOR 
CHAPTER | 


A.1. PROOF OF PROPOSITION | 


Let (0, T) be an arbitrary t-pass-fail mechanism. First, remark that it would never 
be optimal for the designer to choose a cutoff t < 0 by lemma 1. Thus, we can 
optimize over the class of pass-fail mechanisms with positive cutoffs without loss 
of optimality. The payoff of the designer under any f-pass-fail with t > O is given 
by the following function of the single variable r: 


(FU) — F(9(O)) +f af(0) a8 ifr € [0,0[ 
(1 — F(6(8))) ift € [6,4 + V2/y] 


V(t) = 


Remembering that 6(t) = t — ./2/y we can deduce that the function V: [0,6 + 
2/y] — Ris continuously differentiable with derivative given by: 


v(t) = F(t)— F(@(t)) —tf(@(t)) ift € [0, 6[ 
~ | 1-F(@(t)) -— tf (@(1)) if t € [6,6 + ¥2/y] 


If an optimal threshold ry exists, it must therefore satisfy the following first-order 
condition: 
V’(t) =0. (FOC) 


Consider now the following function: 


_ Ft) - FO) 


Gy ee el 
oe 1 - F(@(t)) 
= t . a 
~~ F(A) iffe€ [0,0 + 2/y] 


It is easy to see that the equation (FOC) admits the same solution as the equation 


w(t) = 0. (FOC’) 
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if it exists. The function w is continuous over the interval [0,6 + 2/y] and 


satisfies 
ro- 
= ee 


s(- V5) 


w(O+ a) =i |? >0 


Hence, by the Intermediate Value Theorem, a solution to equation (FOC’) must 
exist in the interval ]0,@ + /2/y[. Remark also that, under assumption 1, the 
function w is strictly increasing on that interval (whenever f is not the uniform 


w(0) = - 


as well as 


distribution), since: 


f(A(t)) f(A(t)) f(A(t)) 
—>JJSe—’ —v-__——’ —e._ ———— 
<1 <0 >0 


fo) J+ [ee Pn eO)) 50 


W(t) = q = 


for all t € [0, 6] and 


reo) 1 - F((1)) | “ 


f(O(t)) f(A(t)) 
—S——’”/ —_—_ eee’ 
<0 >0 


w'(t) = | 


for all t € [0,@ + +/2/y]. This implies that the solution to equation (FOC’ ) must 
be unique in the interval ]0, 0 + V2/y [. 

Let us prove additionally that w must achieve its unique zero in the interval 
]0, V2/y[ for any y > 0. Remark first that: 


2 1-F(0) = 
B-e if y € ]0, 1/c(6, 0)[ 


w(y2/y) = 


if y € [1/c(,0), +o0[ 


fee 


F(0) 
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Under assumption 1 we must have: 


je if y € ]0, 1/c(0,0)f 


~v\27) = 
x 


1 us - 
eee re FO | if y € [1/c(0,0), +c0[ 


which implies that y(./2/y) is strictly decreasing in y. Moreover, we have: 
lim, w(y2/7) = +90 
y0F 


as well as 


jim wV2/7) = 0 


All those facts put together imply that W(./2/y) > 0 for all y > 0. Whence, again 
by the Intermediate Value Theorem, the optimal cutoff ¢, must lie in the interval 


]0, ¥2/y| for any y > 0. This also implies that 6) = 6(t,) € ]—y2/y,0[ for any 


Y. 


A.2. ADDITIONAL PROOFS FOR THEOREM | 


A.2.1. Proof of lemma 1 


We prove successively each of the properties: 


(i) Let us first prove that if o is optimal it must assign zero probability to strictly 


negative types. Assume that o is optimal and a(t) > 0 for some t¢ € [f, O[. 


Let t(@) € arg max,<r a(t) — yc(t, 6) and define the sets 
O; = {0 € O|7,(6) < O}, 


and 
Of = {6 € O|7,(6) = O}. 
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(ii) 


Remark that we can always write designer’s payoff as 
V(x) =f r(0) o(ro(6)) £(8) 40 +f  r(8) 7 (r(6)) (0) 28. 


By definition, the first integral is negative while the second is positive. Now, 
define ¢(t) to be such that ¢(t) = 0 for any t € ]—00, O[ and ¢(t) = a(t) for 
any t € [0,+00[. If O; is empty, then moving from o to ¢ would not change 
designer’s expected payoff and would still be optimal. If ©; is non-empty, 
then moving from o to ¢ brings the value of the first integral to zero and 
could have only two effects on the second integral since types in @¢ have 
unchanged incentives: First, either no type in ©, would invest in a positive 
final type under ¢ implying OZ = ©% and thus keeping the value of the 
second integral unchanged while bringing the value of the first integral to 
zero. Second, some type in ©, could be willing to invest in a strictly positive 
type under ¢ implying ©; C ©¢ and thus increasing the value of the second 
integral. In both those cases, we have V(¢) > V(co) which contradicts o”’s 
optimality. 


Let us now show that 0 must always be increasing. Let (o,tT) be any 
mechanism respecting (IC). Since the agents’ cost has decreasing differences 
the function t must be non-decreasing in 6 by Topkis’ theorem. Together 
with (IC), this implies 


o(t(8)) — o(7(6")) = y (c(7(4), 8) — c(7(6’), @)) = 0, 


for any 6 > 6’. The second inequality comes frome the fact that c(-, 6) is 
an increasing function for any @. As a result, 0 must be a non-decreasing 
function. 


A.2.2. Proof of lemma 3 


Necessity. It follows directly from proposition 2 in Rochet (1987) that property 


(i) is necessary and sufficient for (IC) so let us prove property (11) first. Remark that 


the lowest (resp. highest) recommendation rule in the class of monotone selection 
rules is given by a(t) = 0 for all t € T (resp. o(t) = 1{t > O} for any ¢t € 7). 


Hence, for any monotone selection rule o and any 6 € [60, 6], we must have: 


max 10 (5-22) <x (5-2) < max w- (5-2). 


teT 


2 Yy teT 2 y teT 2 y 


—m_——_"~—aoi -—_ »>-eo—O0°—moonn” 


=u (4) =u (4) =u(4) 


Next, let us prove property (ili). If a mechanism (co ,7) is admissible it must 
satisfy (IC). Hence, by property (i), we have u’(@) = T(@) almost everywhere on 


[60, 8] and corollary 2 implies directly that 6 < u’(@) < ft(@) for almost every 


0 € [90,0]. Let us now prove property (iv). If (a7, 7) is admissible, then we can 
again substitute T(@) by u’(@) almost everywhere on [9, 6]. Remembering that 
U(0) = y(u(6) — 67/2) and remarking that 7 (t(@)) = U(@) + yc(t(@), 8), some 


algebra shows that: 


u’(0)* 


_ — 6u'(0)], 


o(t(8)) = y |u(@) + 


for almost every 6 € [6,4]. Since a(t) < 1 for all t € T we must have: 


(@)2 1 
He) ~ 6u'(0) < —, 
2 y 


u(@) + 


for almost every 6 € [6,0]. Next, we prove property (v). Since (o, T) is admissible, 
there must exist a 67 such that u’(9) = @ for almost all 6 € [69, 6"[ by (IC) and 
corollary 2. Moreover, since u is convex by (IC), it must also be absolutely 
continuous and thus equal to the integral of its derivative. Hence: 


u(8) = u(6o) + [zac 
2: 


ai 
= u(4) — ee u(@), 


for all 6 € [@0,6"[. Let us prove that u(@9) = 0% /2. First of all, remember that 
Oo = —2/y < 0. Since o is monotone, either 0 (0) = 1, in which case T(69) = 0 
and u(@9) = 1/y = 06/2, or0 < a (0) < 1, in which case T(69) = 69 which implies 
that u(60) = 05 /2. As a convex function, u must also be continuous implying that 
u(@') = u(6"). Therefore, we must have u(@) = u(6) for all @ € [9, 0°]. We can 
deduce directly from (IC) and corollary 2 that u’(@) > 0 for almost all 6 € ]6", 6]. 
Therefore, starting from 6", the function u must be increasing which also implies 
that u(@) > u(6) for all 6 € ]6", 6]. Finally, let us prove property (vi). Assume 
that there exists 6 € [@,#-!(0)] such that u’(@) = 7(@). corollary 2 and (IC) imply 
that: eo 
; t(@) if 6 € [6,¢(6)[ 
u'(@) = eet ees 
0 if 6 € [f(6), ] 
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Using again the absolute continuity of u, we obtain: 


OG +[ 7(6) dz 
= u(@) +7(8)(@ — 8) (A.1) 


for all 6 € [6,7(@)[, as well as: 


_ 6 
u(6) = u(F(8)) +f ce 
7(O\2 2 

= u(¢(@)) — or + > (A.2) 

for all 6 € [7(0), 0]. Let us now prove that u(@) = u(@) and u(7(@)) = u(Z(8)). 


Assume that u(@) > u(@). By equation (A.1) we have: 
u(t(@)) = u(@) +£(8)(#(6) — 8) > u(@) +£(6)(7(6) — 8). (A.3) 


Expanding the right hand side of the previous inequality, we obtain: 


eo ieee a de 7 ae 
u(6) + 7(4)(74) — 8) = > + ip =2 Se 4 =9(7(6)). (A.4) 
yy Y 2 


aes 
> 
~~ 
Qi 
Ya 
N 


Combining equations (A.3) and (A.4) leads to u(#(@)) > a(#(@)), a contradiction. 
Therefore, u(@) = u(@). equation (A.4) therefore implies directly that u(7()) = 
u(t(@)) = 1/y +£(4)*/2, which, together with equation (A.2), implies that: 

L @ 

u(@) = —+— =u(8), 

y 2 
for all @ € [7(0), 6]. Finally, by continuity of u and property (v), we must have that 
u(@) = u(@) for all @ € [, O[. The proof for the case where 7(@) > 0 is analogous 
and omitted. 


Sufficiency. To prove sufficiency, we must show that if a pseudo-utility function 
satisfies properties (i) to (v) in lemma 3, then the mechanism (c, T) which induces 
it must be admissible. It particular, it suffices to show that 0 is monotone as 
defined in lemma | since it would directly imply that 7 satisfies all the properties 
in corollary 2. Let u € U where U denotes the set of functions satisfying all the 
properties in lemma 3 and define the function x(@) = y(u(@) +.u’(@)7/2 — 6u’(4)) 
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for any @ € [6 , 0]. First, since u is convex, it must be twice differentiable almost 


everywhere on [69, 6] by Alexandrov’s theorem (see Aliprantis and Border, 2006, 


Theorem 7.28). Hence, x’ exists almost everywhere on [6o, 6] and is given by: 


x'(0) = y (u'(@) + uw" (@)u'(8) — u'(8) — Bu") 
= yu"(8)(u'(8) — 8) 


for almost every 0 € [0,0]. Hence x’(@) > 0 almost everywhere since u’’(@) > 0 
and u’(@) > 6 almost everywhere on [6,6]. Remarking that x(9) = o(t(6)) 
almost everywhere on [0,6] and that t must be increasing on [6o, @] by (IC), 
we can conclude that the function 0 must he increasing on T. Second, for any 
6 € [A0, 4] let g(-,@) be the function defined by 
me 
g(x, 0) = a Ox 


for any x € [0,f(0)]. The function g(-, 6) is differentiable and 
2x(x,0) =x -O>0 


for any x € [0,t(0)]. Hence, g(-, @) is increasing over the interval [0, ¢(6)| for any 


O € [6,8] and is thus minimized at @ where g(0, 0) = —u”(@). Since u(@) > u(@) 
and u’(@) > 6 = u’(@) we have: 


x(@) = y (u(@) + g(u'(9), )) = vy (u(@) + g(u'(8), @)) = 0, 


for all 9 € [9,0], which implies that 0 (t) > 0 for all t € T. Moreover, property 
(iv) directly implies that x(@) < 1, implying that o(t) < 1 for all t € T. Finally, 
let us prove that a(t) = 0 for all t € [6,0[. The convexity of u also implies that 
the left limit of uw’ exists everywhere on [6, 6]. In particular, since u(@9) = 0% jz 
we have limg—o; u’(@) = @ and therefore: 


lim x(6) =y|2+2-@ 
Ft ea (3 i i 
=0. 


which implies that o-(f) = 0 for all ¢ € [0, O[ since x is increasing and bounded 
below by 0. 
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A.2.3. Proof of lemma 4 


Let us first prove convexity. Fix any 6’ € [09,6] and let uj,u2 € U(6") and 
A € [0,1]. Itis immediate that the function w = Au; + (1 —A)uz satisfies properties 
(41) and (411). Next, remark that: 


w’ (0)? 


w(0) + — @w'(6) 


uw’ (8)° u,(8)° 


2 


<Alu,(0) + _ a) +(1-A) [0 + — 0u5(0)} < 


1 

y 
where the first inequality is obtained by applying Jensen’s inequality on the term 
w’(@)*/2 and the second inequality is implied by property (iv) in lemma 3. Let 
us now prove compactness. First, remark that 1/(@") is a closed subspace of 
C([@', @]) because (i) every convex function is continuous, (ii) it is defined by 
closed inequalities, and (iii) for any a uniformly convergent sequence (Uy)nen Of 
convex functions on the compact set [6*, 6], the sequence of derivatives (u’,) nen 
must converge uniformly to u’ by Theorem 25.7 in Rockafellar (1970). Second, 
the set /(") is uniformly bounded: if 6 € [@", 4] then |u(@)| < a(@). Third, the 
space U1(6") is equicontinuous. Indeed, if 6 € [6', 4] and u € U(6") then 


|u’(@)| < £(8). 
From the Mean Value Theorem we can infer that 
|u(@) — u(6’)| < £(A)|6 — 0", 


for any u € U(6") and any 6,6’ € [6',@]. Therefore, for any u € U(6"), any 
6 € [6', 6] and any e > 0, taking 6 = e/f(@) implies 


|9-A'| <5 = > |u(0) -u(@’)| <e. 


Hence, by the Arzelda-Ascoli theorem (Royden and Fitzpatrick, 2010, Theorem 3), 
the space /(@") is compact with respect to the supremum norm. 

A.2.4. Proof of 5 
We first prove upper semicontinuity and then convexity: 


(i) Fix any 6° € [6', 6]. The space U/(6") endowed with the distance induced 
by the supremum norm is a complete metric space. Therefore we can use 
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the sequential characterization of upper semicontinuity: The functional V is 


upper semicontinuous if and only if for every u € U/(@") and € € R we have 


Jim i u, lim Voi(Un) > € => V(u) =. 

Let (Un)nen be an arbitrary sequence in UY such that (u,)nen Converges 

uniformly to some u € U(6") with limy—oo V(un) > €. Invoking again 

theorem 25.7 in Rockafellar (1970) the sequence (w’,),cj must converge 

uniformly to u’. Since the function A(Q, -, -) is continuous for any 6 € ©, we 

must have A(@, u(@), u’(@)) = lim sup A(@, u,(@), u),(@)). Fatou’s (reverse) 
n—00 


lemma in turn implies that 


6 6 
Oe i A(6,u(0), u’())d0 = i‘ lim sup A(0, un (8), 1,(8))d0 


" 
> lim sup A(O, Un(@), u),(0))dO = lim sup V(uy) > €. 


n—0oo 0 n—0oo 


(ii) We now prove that V,:: /(6") — R is convex provided that f’(@) < 0. 
Consider an initial pseudo-utility function u € U/(6"), and a variation 
h: [6,6] — R. A variation is admissible if u+h € U(@"). There- 
fore, it must be that (i) h(4*) = h(@) = 0 and that (ii) (u + h)’ exists almost 
everywhere on the interval [6', @] which implies that that h’ exists almost 


everywhere. Fix now an arbitrary « € [0,1]. Then, w+ eh € U(6") since 
U(6") is a convex set. Consider the function ¢: [0, 1] — R defined by: 


‘= Vi Ge = i A(0,u(6) +£h(6),u'(6) +eh'(6))d0.  (A.5) 


For any admissible variation h and any u € U/(6"), the directional derivative 
of V,; at u in direction h is given by ¢’(0) if it exists. We say that Vp; is 
Gateaux differentiable at u in the direction h if and only if the directional 
derivative #’(O) at u in the direction h can be written as a linear functional 
of h, i.e., 6’(0) is of the form DV,;(u)(h) for any u and h. We then call 
DV, (u) the Gateaux derivative of Vg; at u. Similarly, we say that V,: is 
twice-Gateaux-differentiable if DV,;(u)(h) is Gateaux differentiable at u in 
direction k, i.e., 6’’(0) exists and can be written as a bilinear form in (h, k), 
denoted D?V,;(u)(h, k). 


We then use the following characterization of convexity: The functional Vg; is 
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convex if and only if Vy: is twice-Gateaux-differentiable and D*V,;(u)(h, h) > 
0 for any admissible direction h (see for instance Clarke, 2013, Theorem 
2.26). Remark that A(6,-,-) is twice continuously differentiable. Hence, 
differentiating twice the objective under the integral sign and evaluating the 
expression at € = 0, we obtain: 


6 
8'(0) = [| Axx 8.u(0).4!(8)) H'(8)? 


+ 2A,y (8, u(@), u’(A)) h’(O)h(A) 


+ Axy (0, u(0), u’(0)) wo? ae 


Integrating by parts, we then have: 


6 
8'(0) = [Avy (6,u(0),1'(0)) H (0)? 0 
6 d ; 
as is [A (0,u(@),u'(0)) — qa (6,u(@), u @) h(0)? dd, 


since v(@") = v(@) = 0. This proves that Vp; is twice-Gateaux-differentiable. 
Replacing the terms in both integrals by their expression, we can deduce that: 


6 6 
DV, (u)(h, A) = a _ Bu’ (8) ~ 26) f (Oh (8)? 40+ | _(=£°(0)) h(@y? a 
6 6 
The second integral is always positive since f’(@) < 0 for any @ € [6p, 6] 
under assumption |. Moreover, lemma 3 implies that u’(@) > 6 as well as 
u'(@) > 0 for all 6 > 6". Therefore, 3u’(@) — 26 > 0 for all 6 > 6". Hence, 
the first integral is also always positive, which concludes the proof. 


A.2.5. Proof of lemma 6 


Assume u € U(6") is an extreme point and that there exist Z = [a,b] € ]6o, 6] 
such that u’(@) > 0, @ < u’(@) < t(@) and u(@) < u(@) < a(@) for all 6 € Z. Then 


let us fix some € > O and define the function h,: [69,0] — R such that: 


E 2 Dp 
nto) | 5(0-a)"(0-b)” ifdez 
0 if [40,0] \ Z 
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The function h, is twice continuously differentiable and its first and second 
derivatives are respectively given by: 


h’,(0) = «(6 — a)(6 — b)(26 - (a +b)), 


and 
n(0) = (a +b)? +2ab + 6(6? — (at b)) 


for all 6 € Z. Hence, we must always have u(@) + h,(@) = u(@) if 6 € [O, A] \ Z 
and if 8 € {a,b} as well as w’(@) + h’,(@) = u’(@) when @ € {a, b}. Moreover, it is 
always possible to choose € > 0 small enough so that u(@) < u(@) +h,-(@) < u(@) 
and 6 < u’(@) + h’,(@) < t(@) for every 6 € ZT as well as that u”(@) + hZ(@) > 0 
wherever wu” exists so that u + hz, stays convex on [69,6]. This implies that for 
a well chosen ¢ we must have u + h, € U(6"), a contradiction. Therefore, if 
u €U(6") and u() < u(@) < a(@) and 6 < u’(4) < f() on some interval, then 
u’ must be constant on that interval, i.e., that u is affine on that interval. Therefore, 
if the function uw is an extreme point, it cannot be strictly convex on an interval 
where neither bound on its derivative wu’ is binding. Accordingly, assume now that 
u(@) < u(@) < u(@) but that u’(@) is either equal to 6 or f(@) on some interval. 
We know from lemma 3 that uv’ can be confounded with the bound (6) at most 
at one point. Therefore, the only possibility for u to be strictly convex on some 
interval is that u’(@) = 6 on that interval, so u(@) = 67/2 +c for some well chosen 
constantO <c < I/y. 


A.2.6. Proof of lemma 8 


For any uw € U/(6") and any admissible variation h, the directional derivative of Vp: 
at u in direction / is given by: 


6 
¢'(0) = i: (A,.(0,u(0), w’(0))h(0) +A, (6,u(0), u’(0))h’(0)) dd (A.6) 


where ¢: [0,1] — R has been defined in equation (A.5). For convenience, 
denote a(@) = A,(6,u(@),u’(@)) and B(@) = A,(@,u(6),u’(@)). Integrating 
equation (A.6) by parts, we obtain: 


6 
8(0) = | (a(#) - B(@)) n(oa9 (A) 
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which proves that V,; is Gateaux differentiable. Moreover: 
a(@) =u'(6) f (9), (A.8) 


and 
BOO) = {(u(a) + 5u'(o)? —0u'(@)) +u'(@) (W) —)) FO), AB) 


for all 6 € [6°, 6]. The function f defined in equation (A.9) is clearly differentiable 
almost everywhere on [6*, 0] and has a derivative given by: 


B'(8) = (2u"(@) (u'(8) — 8) + (8) (u"() — 1)) (8) 
+ [(u + su (0) = au'(o) + u’(8) (u’(0) - | f'(0) (A.10) 


wherever it exists. Subtracting equations (A.8) and (A.10) and plugging the result 
in equation (A.7) yields the desired result. 


A.2.7. Proof of lemma 9 
First of all, we know from lemma 8 that the Gateaux derivative of V,; has the form: 


0 
DVya(uyin) = [ [- nr 


gt 


+ (w'(@) — 2u’(8) (u’(0) — @) —u'(0) (w"(0) — 1) Jro}n do. (A.11) 


Remark that 8(@) = x(0) +.u’(6)(u’(@) —@) > 0 for all 6 € [6", 6] since x(6) > 0, 
u’'(@) > 0 and u’(@) > @ for all 6 € [6',@]. Moreover, under assumption 1, 
—f’(@) > 0. Therefore, the term —8(6) f’(@) is positive for all 6 € [0",4]. As 
a consequence, the sign of the integrand in equation (A.11) only depends on 
the second term. We are going to show that this term is always positive when 
u€ E(6'). 

Let u € €(6"). We know from lemma 6 that, on any subinterval of [6", O[, 
the function uv must have the form u(@) = a6é + b for some constants a > 0 and 
b € R, and that on any subinterval of [0, 6], the function u must either have the 
form u(@) = a@+b or u(@) = 67/2 +c for some constants a > 0, b € R and 
c € [0,1/y]. First assume that u(@) = a@ + b on some interval [x, x] ¢ [6", 4]. 
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Then we must have u’’(@) = 0 on [x, x], which implies that: 


(-B(0) f°) 42 u'(0) (u'(o) 2 u(0)) > 0, (A.12) 
——{——— = 
>0 >0 50 


for any @ € [x,x]. Now, assume that 6 > 0 and that u(@) = 67/2 + c for some 
constant c € [0, 1/y] on some interval [x, x] ¢ [0,0]. We thus have u’(@) = 6 
and u’”’(@) = 1, and therefore 


[-nar +u'(8) | (u'(@) _ u(6)) > 0. (A.13) 
SS 
>0 >0 50 


for any 0 € [x, x]. equations (A.12) and (A.13) both imply that the integrand of the 
Gateaux derivative is always positive along increasing affine arcs and increasing 
quadratic arcs. Hence, DV,:(u)(u' — uw) > 0 for any u € €(6"). 


A.3. PROOFS FOR COMPARATIVE STATICS 
For notational ease we let 0), = @(y) andr, = t(7). 
A.3.1. Proof of 2 
The optimal cutoff ty is solution to the equation y(t) = 0 where 


_ F()- F@() 


oy [0, A[ 
a 1 - F(@(t)) 
=, t . ga 
~~ F(A) iff¢€ [0,0 + 2/y] 
Remark that 
= 1 : e 
e568) if y < 1/c(0, 0) 
W(0) = = 2 
1-F\é-.,/= 
a= 1-F(6- V7) if y > 1/c(6,8) 
(8-43) 
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The function y (1 — F(@- V2/y))/f(@ _ /2/y) is clearly increasing in y 
under assumption 1 so y(@) > @ — 1/f(@) for any y > 0. Hence, if f(@) > 1/6, 
then w(@) > 0 for all y > 0. As in the proof of proposition 1 in appendix A.1, the 
Intermediate Value Theorem implies that r) € [0, 0] for any y > 0. 


A.3.2. Proof of proposition 2 


Under any pass-fail mechanism, the payoff of the designer as a function of t and y 
writes: 


V(t,y) ‘(Fe — Fr 27) + [0 (0) 09 if t € [0, O[ 
Y= fs 
:(1-F(e- 2/7)) ift € [6,0 + V2/y] 


For any t > 0 and y > 0 we have: 
1 
yy2y 


Hence, V(t, y) is submodular in (t, y). By Topkis’ theorem, we can conclude that 


f'(t- V2/y) < 0. 


Viy(t, Y= 


t, is a decreasing function of y. Equivalently, we can solve for the optimal pass-fail 
rule by optimizing on the last approved type by operating the change of variable 
g=t- V2/y. The designer’s objective function in the new coordinate system 
(@, y) thus writes as follows: 


(+ /2) (Fo P)-ro) +f) pareve if @ € [-/2/7,6 - \2/yI | 
(0+ /2) 0-F@) if 6 € [6 — y2/7, 8] 


V(6,y) = 


Here, we have 


1 
Voy(8,Y) = alte + 2/7) - £(0)) = 0 


——_—_—<—<— 
<0 
if 6 € [-2/y, 0 — ¥2/y|[, as well as 
1 
Voy (6, y) = ——f (8) 20 


yy2y 
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if 0 € [@ —-/2/y, 6]. Therefore, V(0, y) is supermodular (6, y) and whence 6, is 
increasing in y. 
Finally, remember that r, must solve the equation 


W(t,y) =0 
where 
= = f2 
op ri V5) 
f(t-3) 


Remark that we have: 
lim wW(t,y) =t 
y—+00 


which implies that tr) and @) = tf, — 2/y both converge to 0 as y becomes 
arbitrarily large. 


A.3.3. Proof of proposition 3 


Designer’s welfare. The designer’s optimal expected payoff is given by 


6 
Viy) = tly) (FC) - F(A) + | Of (8) dé. 
t(y) 
Hence its derivative is given by 


Vi) = 1 (Y) (FEM) — FOM)) +t) CMF) — OMIM) — HMM FEM) 
= '(y) (F(t(y)) — Fy) — ty) &(y) F(A) < 0. 


Therefore the designer’s payoff is always decreasing with y. 


Agent’s ex-interim welfare. The ex-interim welfare of an agent of type 6 is given 


by 
0 if @ € [8,4(y)[ 
U(6,y) =) 1-ye(t(y), 6) if@ € [6(y), ty) 
1 if 0 € [t(y), 4] 
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A marginal change in y only has an effect for agents whose types are in the interval 
0 € [0(y), t(y)| where we have 


U,(6, vy) = —c(t(y), 9) -yt’ (yer(t(y), 9) - 
te SES eS 
Direct effect< 0 Indirect effect> 0 


Given the quadratic form of the cost function we have 


Uy (8.9) = S(t) ~ (8 ~ 1) ~ 20") 


and hence U,(6,y) > 0 if and only if 6 > t(y) + 2yt’(y) = @(y). Since 
t'(y) < 0 we have 6(y) < t(y). Moreover, since t(y) = 6(y) + V2/y we 
have 6(y) = 0(y) + 2y6’(y) > @(y) because 6’(y) > 0. Therefore, there exists a 
threshold 6(y) € ]@(y), t(y)[ above which U;,(@) = Oand below which U}(@) < 0. 


A.4. PROOF OF PROPOSITION 5 


First of all, (UC) implies that 0 must be non-decreasing on T (see the proof of 
lemma | in appendix A.2.1). This implies that 7(@) > @ for any 6 € © under 
the welfare-optimal selection rule. Moreover, it is still true that 7(@) < f(@) for 
all 8 € ®. Remark also that any implementable pseudo-utility function must be 
bounded below by u(@) = 67/2 and above by #(0) = 1/y +. 67/2 for all 6 € ©. The 
lower bound corresponds to the pseudo-utility under the selection rule a(t) = 0 
for all t € T, while the upper bound corresponds to the case where the allocation 
is given by @(t) = 1 for any t € T. We call admissible any mechanism such 
that o: T — [0,1] is non-decreasing, and that t: © — T is non decreasing 
and bounded in between 6 and ¢(@). We have the following characterization of 
admissible mechanisms. 


Lemma 14. A mechanism (oT) is admissible if, and only if, the pseudo-utility u 
implemented by o satisfies the following properties: 


(i) The function u is convex over ©. As a result, u is differentiable a.e. on © and 
satisfies the envelope condition u’(@) = T(@) wherever it is differentiable. 


(ii) u(@) < u(@) < u(@) for any 6 € ©. 


(iii) 0 < u’(@) < t(@) for any 6 €@. 
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Using the characterization from lemma 14, we can recast the program of the 
planner as follows: 


max [wo (Wo _ 
ue 6 


where U = {u ©€ C(@)|uconvex,u(@) < u(@) < u(0),8 < w’(@) < t(@)}. 
Remark that the variational program of the planner is identical to the one of the 


wOs 7 ou'(o) +o [wo 7 a) f (0) dé. 


0 
2 


principal up to an additional linear term. The rest of the proof thus follows the 
same methodology as for theorem | and details are therefore omitted. First, we 
parametrize the problem with respect to the first initial type 6" investing in a 
positive final type. Then, we prove that the set of convex and increasing functions 
on the interval [67,4] such that u(6") = u(6") is compact and convex for any 
6' € ©. It is not hard to show that the objective functional admits the exact 
same second-order Gateaux derivative than the objective function of the principal. 
Therefore, it is also convex under assumption 1. Finally, the necessary conditions 
on extreme points are identical to lemma 6 and the tangent inequality applies in 
the same way since the Gateaux derivative is identical up to an additive term in q@. 

Now, let us see what is the optimal allocation cutoff. Under a pass-fail rule with 
cutoff t, the welfare is given by: 


W(t) = (FO -F(O))+f 0 f (0) dosa(1~ FCO) J ye(t, 0) f(6) da) . 


Whenever the first-order condition W’(t) = 0 has a solution, it is given by the 
solution to the equation: 


F(t) — F(@@) _ 
rT (FOCW) 


The previous equation is identical to (FOC’) up to the multiplicative term 1 — 
ay(t -—[@|@(t) < 6 < t]). Since, ay => 0 andt— [6|O(t) < 6 < t] = 0, we 
must have | — ay(t — [@| O(t) < 9 < t]) < 1. As aresult, whenever it exists, the 
solution to (FOCW) must be lower than the solution to (FOC’). 


i= (1 —ay(t-E[@|0(t) <@< ‘l)) 
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A.5. PROOF OF LEMMA 10 


Let’s first prove (1). First, let t be (A-IC) under o-. Note that: 


i: Py-r(8)ar(s) 7 (ds | 1) r(dt | 0) x(d0) 
iy ee. 


i, ory 2(8) o(ds |) r(dt | 8) (dd) 
SxTxO 
and 


i iy.2(8)(1 — ae.r(s)) (ds | 1) t(dt | 6) (dé) 
é..7(0) = SxTx© < 0, 


(1 — ag,r(s)) o(ds | t) t(dr | 6) (dé) 
SxTxO 


since, by definition, @,,; and T are best responses to each other under (c, S). 
Hence, it is individually rational to follows action recommendations for the designer. 
That is, @¢,7(1) = 1 and a, -(0) = 0. To prove (ii), note that 


por(t’, 6) = i. gr(1) 6(1 | 1) (dt | 6) 
: [ PO COLAC eae Ca 
xT 


since a¢7(1) = 1 by Gi), and ¢(1|t) = o({s € S : aog.7(s) = 1}\t) for every ¢. 
Using the fact that 7 is (A-IC) under (c, S) and that interim approval probabilities 
are unchanged we have that 


Pox t’, 0) — C(t’, 8) = po x(t’, 8) — C(7’, 8) 
< Por(T, 0) — C(t, @) = po r(T, 0) — C(t, 8) 


for every type 6 and investment strategies t’, which proves (iii). Finally, combining 
(11) and (iii) proves (iv). 
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B. MATHEMATICAL APPENDIX FOR 
CHAPTER 2 


B.1. PROOF FOR PROPOSITION 7 


Let © be any Polish space and let A(Q) be the set of probability measures on © 
endowed with its Borel o-algebra, let also C,(@) be the set of bounded continuous 
and Borel-measurable real-valued functions on ©. 

For any 7, € A(@), by application of the Donsker-Varadhan variational 
formula (see Dupuis and Ellis, 1997, Lemma 1.4.3) we have 


Cin.) = sup [ pu(a.0)n(ae) ~ In [ exp (pu(a,0)) 4(d0)}. (B.1) 
u(a,-)€Cp(O) ¥O .S) 


Taking the Legendre-Fenchel’s dual to the variational equality (B.1) (see Dupuis 
and Ellis, 1997, Proposition 1.4.2) we get 


in{ exp (pu (a, @)) uae) = sup | pu(a,@)n(d0)—-C(y, pu). — (B.2) 
ic) neA(@) ¥O 


Hence, we have 


@,() = — In ( [ exp (pu(a, 6)) uae) 
p \Je 


for any a € A, any uw € A(@) and any p € R{. Moreover, the supremum in 


equation (B.2) is attained uniquely by the probability measure ng(u) € A(O) 
defined by 

Joexp (pu(a,4)) (49) 

Je exp (pula, 0)) 1(d0)" 


for any Borel set © (see, again, Dupuis and Ellis, 1997, Proposition 1.4.2). 


Na(“)(O) = 


B.2. OVEROPTIMISM ABOUT PREFERRED OUTCOMES 


Fix an a € A and let ©, be the (measurable) set of states such that 0, = 
argMaXgecq@ Uu(a,O). Define 6(a,0) = u(a,@) — u(a,6*) for all 6 and some 


15) 


0* € Og. Remark that 7,(1)(@qz) can be expressed as follows: 


[ exp (pu(a, 6) (d8) 
TTA es a 
[ exp (pu(a,6)) 14(d8) 


H(Qq) 
(On) + a 9, PL0H(a, 8) (U8) 


Let’s define the function 


H(@q) 
u(@q) + fe 9, xP (71a 8)) H(a8) 


for any p € Rj. 
First, remark that (0) = u(@,). Moreover, by Leibniz integral rule, we have 


—U(Oq) 0 
[ 5(a,6) exp (p6(a, 0) 11(d6) 
@\e, 


h'(p) = 


for any p € Rj, since 6(a,0) < 0. Finally, we also have that limp—+0 A(p) = 1. 
Hence the probability of payoff maximizing states is bounded below by the Bayesian 
posterior 44(@,), is always increasing and is converging to 1 from below. Hence, a 
wishful Receiver always puts more probability mass on ©, than a Bayesian and 
eventually believes that the state belongs to ©, with probability 1 when p becomes 
large. 


B.3. PROOF FOR LEMMA 11 


Let us study the properties of the belief threshold ™ as a function of p and payoffs. 
First of all, let us define the function 


exp(pUu,) — exp(pu,) 


WwW = — 
pas exp(Puy) — exp(pu,) + exp(pu1) — exp(puo) 
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for any p € R{. To avoid notational burden, we omit the superscript W in the proof. 
We can find the limit of w(p) at 0 by applying I’ H6pital’s rule 


Uy, CX Ug) — Uy ex u 
lim a(p) = lim Uy EXP(PUy) p(pu,) 


p20 Up EXP(PUy) — Uy, Exp(pUu,) + U1 Exp(pu1) — Uo eExp(pUuo) 
Ug — Uy 


Uy —u 1 + uy _- uo 

= i. 
So, we are back to the case of a Bayesian Receiver whenever the cost of distortion 
becomes infinitely high. After multiplying by exp(—puj) at the numerator and the 
denominator of (pe) we get 


1 — exp(p(u, — Up)) 


Or eG ha) =e =a 


So the limit of sz at infinity only depends on the sign of 7) — Uy as, by assumption, 
u, — Uy < Oand uy — uy < 0. Hence, limp 4.0 u(e) = 1 when 7 — uy < 0 and 
limp—+oo H(p) = 0 when 1; — uy > 0. Finally, in the case where uy = “1 we have 


1 — exp(p(u, - U)) 


Fa a ah Dn (ER Py CTT (CET) 


Let us now check the variations of the function. After differentiating with respect 
to p and rearranging terms, one can remark that the derivative of () must verify 
the following logistic differential equation with varying coefficient 


U'(p) = a(p)u(p)C — u(p)), 


where 


Uy EXP(PUy) — UM, EXP(pu;) _ W exp(pu1) — Ho exp(puo) 


= ei =o) ep oa = exn(om) 


for all p € R*, together with the initial condition u(0) = 42. Hence, a completely 
dictates the variations of jz(p). Let us study the properties of the function @ defined 
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on R}. First, still applying again |’ H6pital’s rule, its limits are given by 


: Uy — Uo — (44 — u,) 

1 = a ee ee 

Hee) 5 
1 

= 5 (uo — uy) 


and 


lim a(p) = Uy — Uy 


p—too 


= Umax: 
Second, after rearranging terms, its derivative is given by 


(Up 7 i) (ai Uo)” 


a'(p) = cosh(p(u, —u,))— 1 cosh(p(m — ip) — 1’ 


for any p € R{, where cosh is the hyperbolic cosine function defined by 


ext+te* 
2 + 


cosh(x) = 


for any x € R. Remark that the function defined by 


g2 
FO) = cosh(px) — 1 


(B.3) 


is strictly decreasing on R*. So, we have a’(p) < 0 and therefore u™ strictly 


decreasing for all p € Rj if and only if u) —u, > uj; — uo. Accordingly, a is always 


a strictly monotonic function if and only if uy # uv; and up # u,. Hence, excluding 


the extreme case where uy = “4 and Up = u, so a’(p) = 0 and p(p) = uw? for all 


p € Rj, three interesting cases arise, all depicted on figure B.1 for different payoff 


matrices: 


(i) If Umax < 0, function @ has a constant sign for any p € Rj if and only if 


Uo < uy, in which case p™ is strictly decreasing from py? to 0. In case 


Uo > uy, a has a varying sign so pu" starts from yz? and is sequentially strictly 


increasing and strictly decreasing toward 0. 


(ii) If Umax = 0, function @ has a constant sign for any p € Rj}. In this case u™ is 


strictly increasing from yz? to 1/2 if and only if uo > u1. 


(iii) If Umax > O, function a has a constant sign for any p € Rj if and only 
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if ug > uy, in which case p™ is strictly increasing from p? to 1. In case 
Ug < uj, @ has a varying sign so y™ starts from yz? and is sequentially strictly 
decreasing and strictly increasing toward 1. 


Accordingly, in case uw” is non-monotonic in p, there always exists some p > 0 
such that uw" (p) = w2. This concludes the proof. 
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— uo > uy and u > 
a oe --- ug < uy and u > 
---Ug < uy and u < 


SI SI SI 


Umax + 
ae p 
0 
Ww 
a HL 
ie 1 Rete aileecae 
---Uug > Uy 
—Ug < uy, 
Umax | 
p 0 > p 
0 
(b) Functions a and «” when umax = 0. 
Ww 
a LH 
lx 
Umax 
---ug > uy andu>u es 
Daag ee --- ug > uy andu <u | “ 
—uo < uj andu <u ye” Seer AS is tus vauatuesiardotyacdaneiaeyacuceitenr sashes 
> p + > fp 
0 p 


(c) Functions @ and aw when Umax > 0. 


Figure B.1: Functions a and x for different payoff matrices (u°)a.9¢4x@. Action 
a = 1 is favored by a wishful Receiver whenever pu" < p?. 
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B.4. PROOF FOR PROPOSITION 8 


Assume |®| =n where 2 < n < oo. We want to show that AP c A} if, and only if, 
the payoff matrix (u(a, @))(a,9)eAx@ and the wishfulness p verify at least one of 
property (i), (ii) or (iii) in lemma 11 for every pair of states 0, 6’ € O. 


Extreme point representation for A? and A‘. First, remark that A? and AW’ 


are both convex polytopes in R'®! defined by 


AB = A(@) A \H eR] S* u(a,)u(@) = Y* ua’, 6) (4), Va" € Al, 


dcO dcO 
and 
AW = A(®) 
a) \! e RIO | 3S exp (pu(a,@)) u(@) = ay exp (pu(a’,6)) u(@), Va’ € Al. 
dcO dcO 


The sets A? and A® are thus compact and convex sets in R!®! with finitely many 
extreme points. Let us now characterize the sets of extreme points of AP and rie 
For any pz € R'®!, define the systems of equations 


A®?.u=b, pw>0 


and 
A” -u=b, w>0 
where 
AB u? (61) re u? (On) 
1 aa 1 ° 
and 
A? = 


u"(6,;) ... u(@,) 
1 on 1 ‘ 


are 2 x n matrices, where u?(@) = u(1, 0) — u(0, 0) and u"(@) = exp(pu(1, 6)) — 
exp(pu(0, @)) for any 6 € ©, and 
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In what follows, we always assume that (u?(@))gc@ and (u™(@))ge@ are such that 
rank(A®) = rank(A™) = 2.! Let us recall some mathematical preliminaries. 


Definition 10 (Basic feasible solution). Let 6,0’ € © be any pair of states. A 
vector pt" is a basic feasible solution to A® - 4 =b (resp. AW - uw =b), > 0, for 
6,0’ if A® - u* =b (resp. AW - uw = b), *(@), u*(0’) > 0 and y*(0”) = 0 for any 
0” # 6,0". 


Lemma 15 (Extreme point representation for convex polyhedra). A vector p € R'®| 
is an extreme point of the convex polyhedron AP (resp. AP ) if, and only if wis a 
basic feasible solution to A® - 4 =», up > 0 (resp. AW - uw =b, pe > O). 


Proof. See Panik (1993) Theorem 8.4.1. oO 


Therefore, to find extreme points of A?, we just have to solve the system of 
equations 
1(A)u? (8) + 1(0')b(6") = 0 
H(O) + u(0") = 1 (B.4) 
H(O), u(6") = 0 


for any pair of states 6,6’. When either w(@) = 0 or u(6’) = 0, the solution to 
(B.4) is given by the Dirac measure 6g only if u2(9) > 0. Denote ce the set of 
such beliefs. The set & then corresponds to the set of degenerate beliefs under 
which a Bayesian Receiver would take action a = 1. Now, if (6), 4(0’) > 0 then 
the solution to (B.4) is given by 


u(0, 6’) — u(1, 6’) 


Bie sas 2 = _ "eee ee a 
Hog = u(0, 6’) — u(1, 6’) +u(0, 8) — u(1, 6) 


Such a belief is exactly the belief on the edge of the simplex between 6 and 6g at 
which a Bayesian decision-maker is indifferent between action a = 0 anda = 1. 
Denote 7? the set of such beliefs. Hence, we have 


ext(A?) = a Ur, 


Following the same procedure, the set of extreme points of AY is given by € Hi vn", 
where ey is the set of degenerate beliefs at which u(@) > 0 and Z™ is the set of 
beliefs 


exp(pu(0, 0’) — exp(pu(1, 0’)) + exp(pu(0, @)) — exp(pu(1, @))’ 


!This amounts to assuming that payoff are not constant across states. 


Lg @(P) = 
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for any 6,6’ € ©. Now, applying Krein-Milman theorem, we can state that 
Af = co (ef uz) 


and 
AY =co er U ze 


Sufficiency. Assume the payoff matrix (u(a, 9))(a,9)cAx@ and the wishfulness p 
verify at least one of property (i), (i1) or (111) in lemma 11 for every pair of states 
0,6’ € ©. Therefore, we have ie (p) > Maa for any 0,6’ € ©. This implies 
gE e AY, since action a = 1 is favored by a wishful Receiver on each edge of the 
simplex. Moreover, it is trivially satisfied that i = ae . Hence, since any point in 
A? can be written as a convex combination of points in €? UZ? c A", it follows 
that AP c AY. 


Necessity. Assume now that Af C AY, Therefore, we have Bea (p) > itp y for 
any 6,0” € © which implies that (u(a, @))(a,6)eax@ and the wishfulness p verify at 
least one of property (1), (ii) or (iii) in lemma 11 for every pair of states 6,0’ € ©. 


B.5. PROOF FOR PROPOSITION | 1 


First, note that we can always index the voters in an ascending order of £, such that 
n(u, B') = ;() for all « € A(@) whenever i < j, such that 


n-l on 
mu) =) (HB!) — n(u, B") 


i=l j=itl 
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does indeed represent the absolute difference between each pair of beliefs. Now, 
remark that the sum can be rearranged in the following way: 


n-l on 
m(u)=)) >) n(u.B') - nu, B!) 


i=l j=itl 

=(n — 1)n'(u) + (n - 2)n? (We) — 7° (Ke) + 
soe g nn") = nna. B) iced 
n(m, B"') — (n= 2)n(u, B"') — (n= 1)" (W) 


=) (n+ 1—2i)(n(u, B') —n(u B""')), 
i=l 


for any uw € [0,1], where m = (n + 1)/2. That is, we can express it in terms of 
the differences in beliefs among voters who are equidistant from the median. To 
see that this is true, we need to first realize that each belief appears n — 1 times 
in equation (2.6) (since each belief is paired once with each of the other n — 1 
beliefs). The beliefs of voters below the median appear more often as positive 
than negative (the belief of the first voter is positive in all of its pairings, the 
belief of the second voter is positive in all of its pairing except for the pairing 
with the first voter, etc.), whereas the beliefs of voters above the median are more 
often negative than positive. If we rearrange the terms of the sum in order to pair 
symmetric voters, the term (7(, 8!) — mn (2) appears n — 1 times, whereas the 
term (772(j2) — n(u1, B"~')) appears n — 3 times, since out of the n — 1 times 72(,2) 
appears on equation (2.6), n — 2 of them are positive and | is negative (the converse 
is true for 7(u, B”')). One can continue the same reasoning for all the pairs of 
symmetric voters, and get to the formulation of z() presented above. Note, also, 
that the belief of the median voter is summed and subtracted at the same rate, such 
that it does not matter in our measure of polarization. 

Consider the distance between beliefs of any pair of symmetric voters (1, 6’) — 
n(u, B"*!) fori € {1,...,m}. Given our symmetry assumption these two agents 
are symmetric, such that B’ = 1 — B"*!~'. It is not difficult to show that any 
of those pairwise distances is maximized when agent i is distorting its belief 
upwards and agent n + 1 — 7 is distorting its belief downwards. That is, when 
we [u™ (6), eV (e")]. 

First, the distance between symmetric beliefs in such an interval can be rewritten 
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as 


wexp(pf') 7 Ll 
pexp(pB')+(1-p) wt (1 - p) exp(pf') 


for anyi € {1,...,m}and we [wu (B'), uw (pl). 
Second, by taking the first order condition in this interval and rearranging it we 


nULB) =7GEen = 


get 
w+ (= pexp(pB') _ | 
exp(pB') + (1 — w) 
such that the difference between symmetric beliefs is maximized uniquely at 


? 


— _Wrpam sl 
M=U (B") = 5: 


for any i € {1,...,m}, B' € ]0, 1[ and any p € R*. Since 


wu (p™) = arg max n(u, B') — n(u, B"*') 
pe[0,1] 


for anyi € {1,...,m}, we get 


p(B") = arg max m(11), 
pe[0,1] 


which concludes the proof. 


B.6. PROOF FOR PROPOSITION 10 


First, we define the function 


6 
W2)= ay | explo) fo) 


for any z € [9, @[ and adopt the convention that (6) = exp(p@). It is not difficult 
to show that w is a continuous and strictly increasing function from w(@) =< < 1 
to w(@) = exp(p@). Define similarly the function 


i 


~(z) = 1—-F@ 


/ " af (6) 4, 
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for any z € [@,@[ and y(@) = @. Again, it is not difficult to show that y is a 
continuous and strictly increasing function from y(@) = m < 0 to y(@) = @. 

Since yw is strictly increasing, it thus suffices to show that y(02) > 1 = w(@") 
to prove that 9” < 98. Applying Jensen’s inequality, it follows that 


w(z) > exp(p¢y(z)), 


for any z € ]0, @[, where the strict inequality comes from the strict convexity of 
Zz + exp(pz) and the non degeneracy of F. In particular, Jensen’s inequality holds 
with equality at @ and 6, but, by the intermediate value theorem, it must be that 0? 
(as well as 6”) lie in the open interval |, O[. Thus, we have 


w(0") > 1, 


since y(07) = 0 and 68 # 6, 0. 
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C. MATHEMATICAL APPENDIX FOR 
CHAPTER 3 


C.1. PROOFS FOR SECTION 3.4.1 


C.1.1. Proof of lemma 12 


Let 0 € & and suppose that there exist wu,’ € supp(o) with p(w) = p(w’). 
Consider the following market: 


o (LL) om’) ; 


A= G@+o@)  oM+tom@) 


By the convexity of Mj(,), p(@) = p(w). Define o’ in the following way: 
o'(f) = o(u) +0(w’), o' (uw) = o'(u’) = 0 and o’ = o otherwise. Is it easy 
to check that 2 ,esupp(o) T(H) WC) = Yipesupp(o’) 7 (H) W(e). We can iterate 
this operation as many times as the number of pairs v, v’ € supp(a’) such that 
P(Y) = p(”) to finally obtain the desired conclusion. 


C.1.2. Proof of lemma 13 


Let y° be an inefficient aggregate market, hence for any optimal segmentation 
oa € X(p°), |supp(o~)| > 2. Let o be a direct and optimal segmentation of y° and 
ft © supp(o~) such that yu is in the interior of M,,,,). Let v be any other market in 
the support of o-. Consider the market: 


es () 2 
oto) oto) | 


Because py” is inefficient, it is without loss of generality to assume that € is also 
inefficient. 

Denote f (resp. v) the projection of € on the boundary of the simplex M in 
direction of yz (resp. v). For o to be optimal, the segmentation of € between yu 
with probability o (4) /(o (4) +a (v)) and v with probability 7 (v)/(o(u) +a (y)) 
must be optimal. In particular, it must be optimal among any segmentation on 
[Hv]. 

There exists a one-to-one mapping f: [f,v] — [0,1] such that for any 
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yéel[avl,y = f(ya+C —- f(y))v. Thus, the set [7,7] can be seen as all the 
distributions on a binary set of states of the world {f7, v}, where for any y € [f, Vv], 
f(y) is the probability of 2. 

Therefore, the maximization program, 


max » a(y)W(y) (S) 


oe) yesupp(c) 


where 
E(6) = {oe a(larl)| S. omy=el, 
yesupp(7) 

is a Bayesian persuasion problem (Kamenica and Gentzkow, 2011), with a binary 
state of the world and a finite number of actions. Hence, applying theorem 1 
in Lipnowski and Mathevet (2017), there exists an optimal segmentation only 
supported on extreme points of sets M €¢ MIA] := {M. KN[M, Vv] |k €K and MN 
[i, 7] # O}. It happens that for any M € MI/-"l, so that M = My 2 [f, 7] for 
some k, if y is an extreme point of M, then it is on the boundary of (M;). 

Let (1, v’) with respective probabilities (a, 1 — a) be a solution to (S) where pu’ 
and y’ are extreme points of some M € MI4"!, We now consider the segmentation 
& such that o(y) = o(y) for all y € supp(c) \ {u,v}, T(W’) = (Oo (W) +o())a, 
o(v’) = (o(u) + o0(v))(1 -— @), and & = 0 otherwise. One can easily check that 
ao € X(p°). If & is not direct, that is, there exists y € supp(&) such that (w.l.o.g.) 
p(y) = p(y”), then construct a direct segmentation & following the same process 
as in the proof of lemma 12. Then, if & is not only supported on boundaries of 
sets {Mx} <0, reiterate the same process as above, until you reach the desired 
conclusion. 


C.2. PROOFS FOR SECTION 3.4.2. 


C.2.1. Proof for proposition 13 


Fix an aggregate market y° and let c € =(y°) be optimal and direct. Suppose 
by contradiction that there exist u, u’ € supp(o) such that vg ‘= min{supp(s)} < 
max{supp(’)} =: vg and v, := min{supp(y’)} < max{supp(y)} =: v.. Assume 
further, without loss of generality, that min{supp(j)} < min{supp(y’) }. 

Let us define 


Nees o (pu) a o(y’) y 
oto) o(u)to) - 
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If o 1s optimal, then we have 


V(@) = — 2 wy 4 —2 


4G) ayso 


The proof consists in showing that we can improve on this splitting of j and thus 
obtains a contradiction. 
Let € > 0, and define //, (i’ as follows: 


Mete ifk=b 
Mk=) He-e€ ifk=c 
Uk otherwise, 


and, 
fee ACL 
o(u’) 
Hy = pee ifk =c 
o(u’) 
Ly otherwise. 
By construction, we have 
o (uU) v o(y’) vi 


Moto oto) 

Note that vq is still an optimal price for Z. Indeed, for any vg < vg < vp, the profit 
made by fixing price vz is equal in markets yw and i and for any vp, < vz < Vv; the 
profit made by fixing price vz, is strictly lower in fi than in yw. On the contrary, 
b(t’) = b(n’) and it is possible that the inequality holds strictly. In any case, it 
must be that #(i’) = v. for b < e < d. Denote a := o(u)/(o7(u) +o (W’)), hence 
o(u)/o(u’) = a/(1 -— @). Remark that 


aW (jt) + (1— a) W(x") — (@W(u) + (1 - a) W(p’)) 
= a(W(a) - W(u)) + (1 - @) (We) — WW), 


that 
W(a) — W(u) = E(Ap(vo — Va) —Ac(Ve - Va))s 


it 


and that 


W(’)-W(W’) = [- > Aidt (Ve —vVp)- » AnH, Vk - Vo) + Ac (—}) E(Ve - 2) ; 


k>e b<k<e 


Hence, rearranging terms, we obtain: 


aW (ji) + (1 — a) W(t’) — (@W(u) + (1 - a) Wp’) = 
ae(Ap(vp — Va) a AAV: = Va)) 


+(1-a) Soave mn) > talon vo) 42 (7a) lee) 


k>e b<k<e 


which simplifies into 
aW (fi) + (1 a) Wi’) — (aW(u) +1 - @)W (pr) = 


@EAp(Vp-Va)-@EAC(Ve-Va)-(1-@) ») AkM,.(Ve — Vo) + oy AM (Vk - v0 


k>e b<k<e 


But remark that 


weds (Vp - Va) — @EA¢(Ve — Va) - (1 - @) [Sy aur v9 > ashi 09) 


k>e b<k<e 


>@éAp(Vp — Va) — MEAD (Ve — Va) — 1 - @) ») Abst fi, (Ve — Vb) + > Api VE - v0] 


k>e b<k<e 
=@EA} (Vp — Va) — Apt |@E(Ve — Va) — 1 - @) ») LH, (Ve — Vp) + iy ME - v9} 
k>e b<k<e 
(C.1) 

Finally, 

Ap 

(C.1)>0 => >K 

b+1 

where 


7 @E(Ve — Va) — (1 - a)( Lik>e H).(Ve =Vp) tees Hi (Vk — vp) 
= @é(Vp — Va) 


which ends the proof. 
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C.3. PRoors OF SECTION 3.4.3. 


C.3.1. Proof of proposition 14 


As argued in the core of the text, all markets with uniform price v, belonging to 
no-rent region must be optimally segmented by splitting z* between 


Hi M5 
; | be ’ 3 Os | 
oC CO 
and . : 
Ll fa 
= 0054s Jas ee 
l-o l-o 


Such a segmentation indeed gives no rents to the monopolist if v,, is an optimal 
price in both y* and yu”. That is, if: 


u-l x 
neat 20S, Aug), V2<j<u-1 (NR-s) 
i=j 
and 
Sat 
w20(S AE). Vutl<j<K (NR-r) 
i=j 


As such, any optimal segmentation under strong redistributive preferences that 
maximizes consumer surplus must have 


s _ Vl 
My = —_> 
Vu 
and 
u-1 
Viz : 
O= H; > 
Vu — V1 4 
i=l 
as well as 


,  MaYu — Dj HEV 
u~ K * . 
Dey HM; Vu — V1 
These conditions pin down the segmentation 7*, Conditions (NR-s) and (NR-r) 
are satisfied whenever o'® is efficient, which concludes the proof. 
It is also interesting to note that conditions (NR-s) and (NR-r) define the no-rent 


region inside M,, as a convex polytope. Indeed, we can rearrange both conditions 
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and get: 


joi u-1 
O>-a; )ujp+(l-a;) ui V2sjsu-l (NR-s) 
i=l =j 
and 
jJ-1 K 


-B; )ui+(1-B;) up Vutlsj<K (RY 
i=] 


-—_—_ > 
Vj(Vu — V1) = 


where 
Ass ViQva- vj) 
: vju —v1)’ 
and 
Eee: 
d vj(Vu —vi) 


The conditions expressed above define K — 2 half-spaces in RX. The no-rent region 
in M, is thus given by the closed polytope defined by the intersection of such 
half-spaces. We can represent such polytope as follows: 


Ty ={we M,|A-p<z}, 


with 
_{S Os p(K-2)xK 
Or R 
and 
0 
0 
Z= v1 ers? 


“t Vu+l(Vu-V1) 
eet = 
VK(Vu-V1) 


where Og and Og are null matrices with respective dimensions (u — 2) x (u — 1) 
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and (K —u) xX (K + 1—u), and where 


—-a2 l-a vay l-a l-a 
-—a3 —-a3 «++ Ll-a 1-a3 
S= ; € RURDxWAD, 
—MHy-2 —Ay-2 +++ 1-ay2 1-ay-2 
~@My-1 ~@y-1 *"° =dy2{. b=ay4 


and 


—Buti l= Buti as LS Buti | Buti 
—Bu+2 —Bu+2 oes Le Bu+2 I Bu+2 


p(k u)x(K+1—u) 


=—De.: Apes oes De peep Lope 
=PR Spee 23%, SBR 1 — Bx 
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